DESIGN AND REALIZATION OF LOW COMPLEXITY DEEP LEARNING MODEL FOR REAL-TIME KEYWORD SPOTTING APPLICATION USING JETSON XAVIER NXTM

Keyword Spotting (KWS) is the task of recognizing spoken command words from a stream of audio. The main challenge of creating a real-time KWS system is the system’s real-time performance. This makes edge computing a suitable option. To implement a system using the edge computing paradigm, the sys...

Full description

Saved in:
Bibliographic Details
Main Author: Amadeus, Clarence
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/80265
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Keyword Spotting (KWS) is the task of recognizing spoken command words from a stream of audio. The main challenge of creating a real-time KWS system is the system’s real-time performance. This makes edge computing a suitable option. To implement a system using the edge computing paradigm, the system has to have an accurate, yet simple AI model. Moreover, the choice of edge device is also crucial. In this thesis, the authors want to make a KWS system implemented in Jetson Xavier NXTM . The AI model implemented in this system is SpectroNet, a low-complexity hybrid CNN-LSTM architecture that the author made. It has 93.33% accuracy and a total parameters of 89,241. Jetson Xavier NXTM is chosen as the edge device because of its strong computational power as an embedded device. The result of the implementation is quite good in terms of accuracy, indicated by no accuracy drop between the model implemented in PC and Jetson Xavier NXTM . To improve the speed of the system, the TensorRTTM library is used to further optimize the model. Optimization of the model is found effective, reducing 52.57% of the total operation performed in SpectroNet when FP32 precision is used, and 53.97% when FP16 precision is used. The model is also sped up by 10% if FP32 precision mode is used and 14.75% if FP16 precision mode is used. However, there is a slight accuracy drop of 0.33% after the optimization process. This slight drop in accuracy is considered negligible compared to the performance boost that TensorRTTM gives.