DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

Jiajun Zhou, Jiajun Wu, Yizhao Gao, Yuhao Ding, Chaofan Tao, Boyu Li, Fengbin Tu, Kwang-Ting Cheng, Hayden So, Ngai Wong

December 2023

Abstract

To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths ($<$ 8 bits). This work targets an adaptive data representation with variable-length encoding called DyBit. DyBit can dynamically adjust the precision and range of separate bit-field to be adapted to the DNN weights/activations distribution. We also propose a hardware-aware quantization framework with a mixed-precision accelerator to trade-off the inference accuracy and speedup. Experimental results demonstrate that the inference accuracy via DyBit is 1.97% higher than the state-of-the-art at 4-bit quantization, and the proposed framework can achieve up to 8.1$\times$ speedup compared with the original model.

Type

Journal article

Publication

In IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems

This paper has firstly been accepted as a poster in DAC 2023.

AI Hardware

DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

Abstract

Jiajun Wu

PhD Student

Related