SqueezeBlock: A Transparent Weight Compression Scheme for Deep Neural Networks

Mo Song, Jiajun Wu, Yuhao Ding, Hayden So

February 2024

Abstract

Modern Deep Neural Networks (DNNs) are no-torious for their large memory footprint, which impacts not only the storage capacity requirement in resource-constrained embedded systems, but also the performance of an inference machine due to data movement. In this work, we demonstrate a transparent weight compression scheme, called SqueezeBlock, which effectively reduces the memory footprint of DNN models with only minimal impact on their accuracy without the need for retraining. SqueezeBlock employs three steps, namely, clustering, quantization, and block encoding, to compress weights of DNN models and relies on automatic design space exploration to derive the optimal encoding configuration. Custom hardware decoders can be generated automatically for seamless integration with the memory subsystem. Experiments on a range of DNNs show that SqueezeBlock can effectively compress the original fp32 weights by up to 4.88× to 6 bit per weight with the loss of accuracy kept within 0.92% across tested models.

Type

Conference paper

Publication

In 2023 International Conference on Field Programmable Technology

AI Hardware

SqueezeBlock: A Transparent Weight Compression Scheme for Deep Neural Networks

Abstract

Jiajun Wu

PhD Student

Related