SPEAKER RECOGNITION USING MOBILENETV3 FOR VOICE-BASED ROBOT NAVIGATION
Giving robots the ability to recognize the different persons they are speaking to is a first step toward enhancing their perceptual and thinking abilities. In this context, ourresearch is implemented on a delivery robot that requires constraints on control and interaction. Implementing a speaker rec...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/83741 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:83741 |
---|---|
spelling |
id-itb.:837412024-08-12T20:49:59ZSPEAKER RECOGNITION USING MOBILENETV3 FOR VOICE-BASED ROBOT NAVIGATION Mawadda Warohma, Ayu Indonesia Theses speaker recognition, OSTI-TI, MobileNet V3, FastResnet-34, Human robot interaction (HRI) INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/83741 Giving robots the ability to recognize the different persons they are speaking to is a first step toward enhancing their perceptual and thinking abilities. In this context, ourresearch is implemented on a delivery robot that requires constraints on control and interaction. Implementing a speaker recognition system for a delivery robot is designed to ensure that the robot only executes commands from authorized speakers while avoiding receiving commands from unauthorized speakers. This motivates our research to address text-independent speaker recognition in the context of human-robot interaction. To develop the speaker recognition system, we used the d-vector embedding speaker representation with MobileNet V3 architecture and compared the performance of our proposed method with Fast ResNet-34 architecture. Tests were also conducted on MFCC and Mel-scaled spectogram feature extraction representations to determine which feature representation is suitable for our architecture. The proposed system has been evaluated on Indonesian datasets with various acoustic environments. Fast ResNet-34 achieves an AER of 5.756% with an accuracy of 94.78%, whereas MobileNet V3 achieves an AER of 7.014% with an accuracy of 93.88%. Despite Fast ResNet- 34 showing better performance, the MobileNet V3 approach improves computational efficiency by 98.27%, reduces model size by 87.47%, and speeds up inference time by approximately 7 ms compared to Fast ResNet-34. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Giving robots the ability to recognize the different persons they are speaking to is a first step toward enhancing their perceptual and thinking abilities. In this context, ourresearch is implemented on a delivery robot that requires constraints on control and interaction. Implementing a speaker recognition system for a delivery robot is designed to ensure that the robot only executes commands from authorized
speakers while avoiding receiving commands from unauthorized speakers. This motivates our research to address text-independent speaker recognition in the context of human-robot interaction. To develop the speaker recognition system, we used the d-vector embedding speaker representation with MobileNet V3 architecture and compared the performance of our proposed method with Fast ResNet-34 architecture. Tests were also conducted on MFCC and Mel-scaled spectogram feature extraction representations to determine which feature representation is suitable for our architecture. The proposed system has been evaluated on Indonesian datasets with various acoustic environments. Fast ResNet-34 achieves an AER of 5.756% with an accuracy of 94.78%, whereas MobileNet V3 achieves an AER of 7.014% with an accuracy of 93.88%. Despite Fast ResNet- 34 showing better performance, the MobileNet V3 approach improves computational efficiency by 98.27%, reduces model size by 87.47%, and speeds up inference time by approximately 7 ms compared to Fast ResNet-34. |
format |
Theses |
author |
Mawadda Warohma, Ayu |
spellingShingle |
Mawadda Warohma, Ayu SPEAKER RECOGNITION USING MOBILENETV3 FOR VOICE-BASED ROBOT NAVIGATION |
author_facet |
Mawadda Warohma, Ayu |
author_sort |
Mawadda Warohma, Ayu |
title |
SPEAKER RECOGNITION USING MOBILENETV3 FOR VOICE-BASED ROBOT NAVIGATION |
title_short |
SPEAKER RECOGNITION USING MOBILENETV3 FOR VOICE-BASED ROBOT NAVIGATION |
title_full |
SPEAKER RECOGNITION USING MOBILENETV3 FOR VOICE-BASED ROBOT NAVIGATION |
title_fullStr |
SPEAKER RECOGNITION USING MOBILENETV3 FOR VOICE-BASED ROBOT NAVIGATION |
title_full_unstemmed |
SPEAKER RECOGNITION USING MOBILENETV3 FOR VOICE-BASED ROBOT NAVIGATION |
title_sort |
speaker recognition using mobilenetv3 for voice-based robot navigation |
url |
https://digilib.itb.ac.id/gdl/view/83741 |
_version_ |
1822998246660767744 |