Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration

Crossbar-aligned & integer-only neural network compression for efficient in-memory acceleration

Crossbar-based In-Memory Computing (IMC) accelerators preload the entire Deep Neural Network (DNN) into crossbars before inference. However, devices with limited crossbars cannot infer increasingly complex models. IMC-pruning can reduce the usage of crossbars, but current methods need expensive extr...

Full description

Saved in:

Bibliographic Details
Main Authors:	Huai, Shuo, Liu, Di, Luo, Xiangzhong, Chen, Hui, Liu, Weichen, Subramaniam, Ravi
Other Authors:	School of Computer Science and Engineering
Format:	Conference or Workshop Item
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering::Computing methodologies In-Memory Computing Pruning Quantization Neural Networks
Online Access:	https://hdl.handle.net/10356/165352
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Similar Items

CRIMP: compact & reliable DNN inference on in-memory processing via crossbar-aligned compression and non-ideality adaptation
by: Huai, Shuo, et al.
Published: (2023)

A comprehensive study on optimization techniques for AMR robots recognition models
by: Zheng, Hao Peng
Published: (2025)

You only search once: on lightweight differentiable architecture search for resource-constrained embedded platforms
by: Luo, Xiangzhong, et al.
Published: (2023)

EdgeCompress: coupling multi-dimensional model compression and dynamic inference for EdgeAI
by: Kong, Hao, et al.
Published: (2023)

Smart scissor: coupling spatial redundancy reduction and CNN compression for embedded hardware
by: Kong, Hao, et al.
Published: (2023)

An efficient sparse LSTM accelerator on embedded FPGAs with bandwidth-oriented pruning
by: Li, Shiqing, et al.
Published: (2023)

iMAT: energy-efficient in-memory acceleration for ternary neural networks with sparse dot product
by: Zhu, Shien, et al.
Published: (2023)

Designing efficient DNNs via hardware-aware neural architecture search and beyond
by: Luo, Xiangzhong, et al.
Published: (2022)

An Energy-Efficient Digital ReRAM-Crossbar-Based CNN With Bitwise Parallelism
by: Ni, Leibin, et al.
Published: (2017)

Evaluating the merits of ranking in structured network pruning
by: Sharma, Kuldeep, et al.
Published: (2021)

FAT: an in-memory accelerator with fast addition for ternary weight neural networks
by: Zhu, Shien, et al.
Published: (2022)

Inference acceleration of large language models
by: Zhang, Boyu
Published: (2024)

Efficient and lightweight quantized compressive sensing using μ-law
by: Pudi, Vikramkumar, et al.
Published: (2020)

Work-in-progress: what to expect of early training statistics? An investigation on hardware-aware neural architecture search
by: Luo, Xiangzhong, et al.
Published: (2023)

SurgeNAS: a comprehensive surgery on hardware-aware differentiable neural architecture search
by: Luo, Xiangzhong, et al.
Published: (2023)

ACSL : adaptive correlation-driven sparsity learning for deep neural network compression
by: He, Wei, et al.
Published: (2021)

iMAD: an in-memory accelerator for AdderNet with efficient 8-bit addition and subtraction operations
by: Zhu, Shien, et al.
Published: (2022)

LESS IS MORE: IMPROVING THE PERFORMANCE OF LEARNING MODELS WITH FEWER FEATURES OR FEWER PARAMETERS
by: LIU SHIYU
Published: (2023)

TOWARD A RUNTIME PROGRAMMABLE SPIKING NEURAL NETWORK HARDWARE ACCELERATOR WITH ON-CHIP LEARNING
by: NGUYEN NGOC NHU THAO
Published: (2023)

Self-selective multi-terminal memtransistor crossbar array for in-memory computing
by: Xuewei Feng, et al.
Published: (2023)

EvoLP: self-evolving latency predictor for model compression in real-time edge systems
by: Huai, Shuo, et al.
Published: (2023)

On hardware-aware design and optimization of edge intelligence
by: Huai, Shuo, et al.
Published: (2023)

EMNAPE: efficient multi-dimensional neural architecture pruning for EdgeAI
by: Kong, Hao, et al.
Published: (2023)

An efficient gustavson-based sparse matrix-matrix multiplication accelerator on embedded FPGAs
by: Li, Shiqing, et al.
Published: (2023)

MUGNoC: a software-configured multicast-unicast-gather NoC for accelerating CNN dataflows
by: Chen, Hui, et al.
Published: (2023)

ZeroBN : learning compact neural networks for latency-critical edge systems
by: Huai, Shuo, et al.
Published: (2022)

HSCoNAS : hardware-software co-design of efficient DNNs via neural architecture search
by: Luo, Xiangzhong, et al.
Published: (2022)

EdgeNAS: discovering efficient neural architectures for edge systems
by: Luo, Xiangzhong, et al.
Published: (2023)

Bringing AI to edge : from deep learning's perspective
by: Liu, Di, et al.
Published: (2022)

A lightweight approach to crowd density estimation: leveraging network pruning for model compression
by: Lye, Jin Kai
Published: (2025)

Towards efficient convolutional neural network for embedded hardware via multi-dimensional pruning
by: Kong, Hao, et al.
Published: (2023)

An organic-based diode-memory device with rectifying property for crossbar memory array applications
by: Teo, E.Y.H., et al.
Published: (2014)

Utility distribution matters: enabling fast belief propagation for multi-agent optimization with dense local utility function
by: Deng, Yanchen, et al.
Published: (2022)

A novel method for wavelet quantization of noisy speech
by: Madhukumar, A. S., et al.
Published: (2020)

Optimized data reuse via reordering for sparse matrix-vector multiplication on FPGAs
by: Li, Shiqing, et al.
Published: (2022)

Collate: collaborative neural network learning for latency-critical edge systems
by: Huai, Shuo, et al.
Published: (2023)

Pruning Blocks for CNN Compression and Acceleration via Online Ensemble Distillation
by: Wang, Z., et al.
Published: (2022)

A connectionist approach to generating oblique decision trees
by: Setiono, R., et al.
Published: (2013)

Analysis of Hidden Representations by Greedy Clustering
by: Setiono, R., et al.
Published: (2014)

Effective data mining using neural networks
by: Lu, H., et al.
Published: (2014)