2

TATAA: Programmable Mixed-Precision Transformer Acceleration with a Transformable Arithmetic Architecture

In this work, we have presented TATAA, a programmable accelerators on FPGA for transformer models by using a novel transformable arithmetic architecture. Using TATAA, we demonstrate that both low-bitwidth integer (int8) and floating-point (bfloat16) operations can be implemented efficiently using the same underlying processing array hardware. By transforming the array from systolic mode for int8 MatMul to SIMD-mode for vectorized bfloat16 operations, we show that end-to-end acceleration of modern transformer models including both linear and non-linear functions can be achieved with state-of-the-art performance and efficiency. In the future, we plan to explore more general FPGA implementations of TATAA with more devices support (i.e., with or without HBM) and to enhance the flexibility of our compilation framework to accelerate future transformer models as they are being rapidly developed.

DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

This paper has proposed a novel hardware-aware quantization framework, with a fused mixed-precision accelerator, to efficiently support a distribution-adaptive data representation named DyBit. The variable-length bit-fields enable DyBit to adapt to the tensor distribution in DNNs. Evaluation results show that DyBit-based quantization at very low bitwidths ($

An Energy-efficient Deep Belief Network Processor Based on Heterogeneous Multi-core Architecture with Transposable Memory and On-chip Learning

This paper presents an energy-efficient DBN processor based on heterogeneous multi-core architecture with transposable weight memory and on-chip local learning. In the future, we will focus on ASIC implementation of the proposed DBN processor in a GALS architecture with a 7T/8T SRAM-based transposable memory design to solve the bottlenecks and further improve throughput and energy efficiency.

In Situ Aging-Aware Error Monitoring Scheme for IMPLY-Based Memristive Computing-in-Memory Systems

To solve the aforementioned issues of the program-verify scheme, a novel in-situ error monitoring scheme for IMPLY-based memristive CIM systems is proposed in this paper.

Efficient Design of Spiking Neural Network with STDP Learning Based on Fast CORDIC

This paper presents a novel CORDIC based Spiking Neural Network (SNN) design with on-line STDP learning and high hardware efficiency. A system design and evaluation method of CORDIC SNN is proposed to evaluate the hardware efficiency of SNN based on different CORDIC algorithm types and bit-width precisions.