DESIGN ACCELERATOR CONVOLUTION FOR DATA SPARSE INFERENCE PROCESS ON YOLO V3 TINY MODEL
The Convolutional Neural Network (CNN) algorithm is widely used in modern AI systems and has been applied to various technologies, one of them is edge devices. CNN which is widely used for image processing such as detecting and classifying objects from an image in AI. So many related studies have...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/63548 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | The Convolutional Neural Network (CNN) algorithm is widely used in modern AI
systems and has been applied to various technologies, one of them is edge devices.
CNN which is widely used for image processing such as detecting and classifying
objects from an image in AI. So many related studies have developed these
algorithms to increase accuracy in the detection process by creating large
networks. However, the large size of the network presents challenges to the
throughput and energy efficiency of the hardware it uses. The Tiny Yolo V3 model
is one of the architectures for real time object detection processes based on the
CNN algorithm. On hardware that utilizes the Tiny Yolo V3 architecture to process
image detection in real time, it will create significant data movements. And also
from the software side itself, Tiny Yolo V3 can take advantage of correlations in the
data or eliminate weight by using Model Compression Techniques, namely Pruning
and Quantization. The pruning function itself is that data being processed using the
best weights and connections from a Tiny Yolo V3 network. From the results of the
pruning process, weight data become sparse which has many zero values. While the
quantization process is carried out to reduce the data size from the weight.
The inference process that utilizes the weight data that has gone through the
compression process is convolution. While the results of the pruning process on
Tiny Yolo V3 have a sparsity level of up to 75% in several layers, it means that the
convolution process for the inference process will do a lot of multiplication
processes with zero values. Involving a zero value in the convolution process for
the inference process will be inefficient. So a hardware device is designed that can
perform the convolution process by passing data that has a value of zero and will
only process non-zero data. In this study, a convolution accelerator architecture
will be designed so that the sparse weight data that enters the convolution process
will match the required fmap input data, so index is needed to select the fmap input.
The architecture designed can select fmap input data based on the index and
mapping the data results, so that later the output can be used for the next process.
All weight data with a value of zero will be removed after the training process by
pruning, then the input weight that enters the accelerator will flow and carry out
the convolution process without any delay. Later, by carrying out the convolution
process only with non-zero weight input and fmap input adjusted from the indexing
process, it will result in a faster and more efficient convolution process. Where the
result of this research is a convolution accelerator design for compressed sparse
data from the results of the Tiny yolo V3 training process using pruning and
v
quantization. The results of the design test show that the results of the accelerator
give the output of the convolution process faster when applied using trimmed sparse
data. The results of the calculation of all Tiny Yolo V3 layers when carrying out the
convolution process using an accelerator that is designed to be able to reduce 56%
of the convolution process using dense weight data. The amount of data weight
required for the convolution process using this accelerator is 2.5 MB. This data size
is smaller than the weight data from pruning and quantization, which is 4.7 MB |
---|