Small footprint model for noisy far-field keyword spotting
Building a small memory footprint keyword spotting model is important as it typically runs on mobile devices with low computational resources. However, it is very challenging to develop a lightweight model and also maintaining a state-of-the-art result under noisy far field environment. In real l...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/158398 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-158398 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1583982022-06-03T02:23:51Z Small footprint model for noisy far-field keyword spotting Pang, Jin Hui Chng Eng Siong School of Computer Science and Engineering ASESChng@ntu.edu.sg Engineering::Computer science and engineering Building a small memory footprint keyword spotting model is important as it typically runs on mobile devices with low computational resources. However, it is very challenging to develop a lightweight model and also maintaining a state-of-the-art result under noisy far field environment. In real life, noisy environment with some reverberations is degrading the performance of a keyword spotting model. We explored a variety of baseline models and data processing techniques to make effective predictions for keywords. Additionally, we proposed a novel feature interactive convolution model with small parameters for single-channel and multi-channel utterance. The interactive unit is implemented as the attention mechanism to enhance the flow of information by using less computation resources. Moreover, we proposed a centroid based awareness component to improve the multi-channel system by providing some additional spatial geometry information in the latent feature projection space. Single-channel model was evaluated on Google Speech Command V2-12 dataset whereas multi-channel model was evaluated on MISP Challenge 2021 dataset. Our single-channel model achieves accuracy of 98.2% on original Google Speech Command and outperforms most of the previous small models. Besides, our multi-channel model achieves outstanding improvement against the official competition baseline with a 55% gain in the competition score which is 0.152 on 6-channel audio input and a 63% which is 0.126 boost using traditional front-end speech enhancement. Bachelor of Engineering (Computer Science) 2022-06-03T02:23:19Z 2022-06-03T02:23:19Z 2022 Final Year Project (FYP) Pang, J. H. (2022). Small footprint model for noisy far-field keyword spotting. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/158398 https://hdl.handle.net/10356/158398 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Pang, Jin Hui Small footprint model for noisy far-field keyword spotting |
description |
Building a small memory footprint keyword spotting model is important as it typically
runs on mobile devices with low computational resources. However, it is very challenging
to develop a lightweight model and also maintaining a state-of-the-art result under noisy far
field environment. In real life, noisy environment with some reverberations is degrading the
performance of a keyword spotting model. We explored a variety of baseline models and data
processing techniques to make effective predictions for keywords. Additionally, we proposed
a novel feature interactive convolution model with small parameters for single-channel and
multi-channel utterance. The interactive unit is implemented as the attention mechanism to
enhance the flow of information by using less computation resources. Moreover, we proposed a
centroid based awareness component to improve the multi-channel system by providing some
additional spatial geometry information in the latent feature projection space. Single-channel
model was evaluated on Google Speech Command V2-12 dataset whereas multi-channel model
was evaluated on MISP Challenge 2021 dataset. Our single-channel model achieves accuracy
of 98.2% on original Google Speech Command and outperforms most of the previous small
models. Besides, our multi-channel model achieves outstanding improvement against the official
competition baseline with a 55% gain in the competition score which is 0.152 on 6-channel audio
input and a 63% which is 0.126 boost using traditional front-end speech enhancement. |
author2 |
Chng Eng Siong |
author_facet |
Chng Eng Siong Pang, Jin Hui |
format |
Final Year Project |
author |
Pang, Jin Hui |
author_sort |
Pang, Jin Hui |
title |
Small footprint model for noisy far-field keyword spotting |
title_short |
Small footprint model for noisy far-field keyword spotting |
title_full |
Small footprint model for noisy far-field keyword spotting |
title_fullStr |
Small footprint model for noisy far-field keyword spotting |
title_full_unstemmed |
Small footprint model for noisy far-field keyword spotting |
title_sort |
small footprint model for noisy far-field keyword spotting |
publisher |
Nanyang Technological University |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/158398 |
_version_ |
1735491123636338688 |