DESIGN ACCELERATOR CONVOLUTION FOR DATA SPARSE INFERENCE PROCESS ON YOLO V3 TINY MODEL

The Convolutional Neural Network (CNN) algorithm is widely used in modern AI systems and has been applied to various technologies, one of them is edge devices. CNN which is widely used for image processing such as detecting and classifying objects from an image in AI. So many related studies have...

Full description

Saved in:
Bibliographic Details
Main Author: Noor Endrawati, Devi
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/63548
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:The Convolutional Neural Network (CNN) algorithm is widely used in modern AI systems and has been applied to various technologies, one of them is edge devices. CNN which is widely used for image processing such as detecting and classifying objects from an image in AI. So many related studies have developed these algorithms to increase accuracy in the detection process by creating large networks. However, the large size of the network presents challenges to the throughput and energy efficiency of the hardware it uses. The Tiny Yolo V3 model is one of the architectures for real time object detection processes based on the CNN algorithm. On hardware that utilizes the Tiny Yolo V3 architecture to process image detection in real time, it will create significant data movements. And also from the software side itself, Tiny Yolo V3 can take advantage of correlations in the data or eliminate weight by using Model Compression Techniques, namely Pruning and Quantization. The pruning function itself is that data being processed using the best weights and connections from a Tiny Yolo V3 network. From the results of the pruning process, weight data become sparse which has many zero values. While the quantization process is carried out to reduce the data size from the weight. The inference process that utilizes the weight data that has gone through the compression process is convolution. While the results of the pruning process on Tiny Yolo V3 have a sparsity level of up to 75% in several layers, it means that the convolution process for the inference process will do a lot of multiplication processes with zero values. Involving a zero value in the convolution process for the inference process will be inefficient. So a hardware device is designed that can perform the convolution process by passing data that has a value of zero and will only process non-zero data. In this study, a convolution accelerator architecture will be designed so that the sparse weight data that enters the convolution process will match the required fmap input data, so index is needed to select the fmap input. The architecture designed can select fmap input data based on the index and mapping the data results, so that later the output can be used for the next process. All weight data with a value of zero will be removed after the training process by pruning, then the input weight that enters the accelerator will flow and carry out the convolution process without any delay. Later, by carrying out the convolution process only with non-zero weight input and fmap input adjusted from the indexing process, it will result in a faster and more efficient convolution process. Where the result of this research is a convolution accelerator design for compressed sparse data from the results of the Tiny yolo V3 training process using pruning and v quantization. The results of the design test show that the results of the accelerator give the output of the convolution process faster when applied using trimmed sparse data. The results of the calculation of all Tiny Yolo V3 layers when carrying out the convolution process using an accelerator that is designed to be able to reduce 56% of the convolution process using dense weight data. The amount of data weight required for the convolution process using this accelerator is 2.5 MB. This data size is smaller than the weight data from pruning and quantization, which is 4.7 MB