DESIGN AND REALIZATION OF LOW COMPLEXITY DEEP LEARNING MODEL FOR REAL-TIME KEYWORD SPOTTING APPLICATION USING JETSON XAVIER NXTM
Keyword Spotting (KWS) is the task of recognizing spoken command words from a stream of audio. The main challenge of creating a real-time KWS system is the system’s real-time performance. This makes edge computing a suitable option. To implement a system using the edge computing paradigm, the sys...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/80265 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Keyword Spotting (KWS) is the task of recognizing spoken command words from
a stream of audio. The main challenge of creating a real-time KWS system is the
system’s real-time performance. This makes edge computing a suitable option. To
implement a system using the edge computing paradigm, the system has to have an
accurate, yet simple AI model. Moreover, the choice of edge device is also crucial.
In this thesis, the authors want to make a KWS system implemented in Jetson Xavier
NXTM . The AI model implemented in this system is SpectroNet, a low-complexity
hybrid CNN-LSTM architecture that the author made. It has 93.33% accuracy and
a total parameters of 89,241. Jetson Xavier NXTM is chosen as the edge device
because of its strong computational power as an embedded device. The result of the
implementation is quite good in terms of accuracy, indicated by no accuracy drop
between the model implemented in PC and Jetson Xavier NXTM . To improve the
speed of the system, the TensorRTTM library is used to further optimize the model.
Optimization of the model is found effective, reducing 52.57% of the total operation
performed in SpectroNet when FP32 precision is used, and 53.97% when FP16
precision is used. The model is also sped up by 10% if FP32 precision mode is used
and 14.75% if FP16 precision mode is used. However, there is a slight accuracy
drop of 0.33% after the optimization process. This slight drop in accuracy is
considered negligible compared to the performance boost that TensorRTTM gives. |
---|