Design of low-power gesture recognition system and self-organizing map processors

Machine learning applications have gained substantial recognition in recent years due to their significant impact on a myriad of sectors such as healthcare, automation, and artificial intelligence. While these applications bring about impressive advancements, they also impose heavy computational loa...

Full description

Saved in:
Bibliographic Details
Main Author: Lu, Yuncheng
Other Authors: Kim Tae Hyoung
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/177755
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Machine learning applications have gained substantial recognition in recent years due to their significant impact on a myriad of sectors such as healthcare, automation, and artificial intelligence. While these applications bring about impressive advancements, they also impose heavy computational loads and power requirements. Traditional processing systems are often ill-equipped to cope with these increasing demands, leading to inefficiencies and constraints. This, coupled with the surge in demand for portable and energy-constrained devices, necessitates the development of low-power, high-performance hardware accelerators for machine learning applications. This thesis aims to address these challenges, contributing to the design and development of low-power hardware accelerators for machine learning, with a specific focus on hand gesture recognition (HGR) and self-organizing map (SOM) systems. In the initial section of this thesis, we confront the challenges posed by the high power consumption and unstable performance of contemporary HGR systems intended for wearable devices. We propose two ultra-low power, highly accurate HGR systems, substantiated by measurement results from test chips. Over recent years, gesture recognition has evolved into a favored method for human-computer interaction (HCI), especially within the context of smart wearable devices. However, recent endeavors of HGR systems suffer from substantial power consumption or unsatisfactory accuracy, thereby constraining their applicability within the wearable device sector. In chapter 3, we proposed a low-power HGR system that features computing-efficient classifiers and error-tolerant majority voting scheme for final decisions. This system can recognize 6 static gestures and 24 dynamic gestures with peak accuracy of 95% and 94.9%, respectively. The hardware architecture is optimized to achieve high energy efficiency, including 1) adaptive activation of the power-consuming blocks, 2) a neural network processing engine with high data re-using rate, and 3) an error-tolerant sequence analyzer. The measurement result of the test chip fabricated in 65-nm CMOS technology consumes the lowest power of 184 μW at 0.6 V supply voltage. It consumes 48 μs and 0.32 μs for static and dynamic gestures, respectively, which surpasses the state-of-the-art. In chapter 4, to improve the recognition accuracy under the complex background and reduce the processing latency of the system, we proposed a low-power HGR system with the following algorithmic improvements, including 1) color- and depth-based hand segmentation method, 2) bi-directional convolution-based feature extraction, and 3) iteration-free feature clustering algorithm. Additionally, to implement the aforementioned HGR algorithms with high power efficiency, the hardware architecture adopts the following optimizations, including 1) continuous low-power control of the computing-intensive modules, 2) fully-pipelined architecture for feature extraction and data clustering, and 3) computing-efficient cluster combiner without redundant comparison. The prototype chip fabricated in 65-nm CMOS technology achieves the lowest power of 181 μW at 0.58 V supply voltage. It can recognize 9 static gestures and 20 dynamic gestures with an average accuracy of 94.4% and 98.6%, respectively. Additionally, the finger tips can be tracked with an average error rate of 1.3 pixels. The second section introduces a new hardware architecture to address the high power consumption in the existing SOM hardware and improve the system’s scalability. SOM is a versatile unsupervised machine learning algorithm that is widely used in data visualization, image quantization, and pattern recognition. However, the massive vector computation in SOM leads to intensive memory access, which causes high power consumption. In Chapter 5, we proposed a hardware SOM based on computing-in-memory, which accelerates part of the computations inside memory. Therefore, compared with the conventional SOM accelerators, the memory access during the recall and training stages of the SOM is reduced by 50% and 80%, respectively. Moreover, the neurons with extremely low update rates, namely dead neurons, introduce redundant computation and signal toggling and cause energy waste. Therefore, we proposed a dead neuron pruning scheme to monitor the updating status of each neuron and prune the inactive ones in the runtime, which reduces the power by up to 12.5%. Additionally, a scalable architecture is proposed to support the SOM networks with different sizes. The test chip fabricated in 65-nm CMOS technology achieves the lowest power of 2 mW at 0.96 V supply voltage. It achieves the peak training and recall energy efficiency are 350.7 GCUPS/W and 474 GCPS/W, respectively, which surpasses the state-of-the-art.