MAICC : A Lightweight Many-core Architecture with In-Cache Computing for Multi-DNN Parallel Inference

Abstract

The growing complexity and diversity of neural networks in the fields of autonomous driving and intelligent robots have facilitated the research of many-core architectures, which can offer sufficient programming flexibility to simultaneously support multi-DNN parallel inference with different network structures and sizes compared to domain-specific architectures. However, due to the tight constraints of area and power consumption, many-core architectures typically use lightweight scalar cores without vector units and are almost unable to meet the high-performance computing needs of multi-DNN parallel inference. To solve the above problem, we design an area- and energy-efficient many-core architecture by integrating large amounts of lightweight processor cores with RV32IMA ISA. The architecture leverages the emerging SRAM-based computing-in-memory technology to implement vector instruction extensions by reusing memory cells in the data cache instead of conventional logic circuits. Thus, the data cache in each core can be reconfigured as the memory part and the computing part with the latter tightly coupled with the core pipeline, enabling parallel execution of the basic RISC-V instructions and the extended multi-cycle vector instructions. Furthermore, a corresponding execution framework is proposed to effectively map DNN models onto the many-core architecture by using intra-layer and inter-layer pipelining, which potentially supports multi-DNN parallel inference. Experimental results show that the proposed MAICC architecture obtains a 4.3× throughput and 31.6× energy efficiency over CPU (Intel i9-13900k). MAICC also achieves a 1.8× energy efficiency over GPU (RTX 4090) with only 4MB on-chip memory and 28 𝑚𝑚2 area.

Publication
In 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’23)
Weimin Zheng
Weimin Zheng
Professor