Unifide framework for speaker-aware isolated word recognition

The explosive growth of various kinds of personal electronic devices in recent years has spawned substantial interest in personalized voice-based human device interaction. There exists a need for robust and computationally-efficient techniques to help realize mobile and embedded computing applicatio...

Full description

Saved in:
Bibliographic Details
Main Author: George Rosario Dhinesh
Other Authors: Thambipillai Srikanthan
Format: Theses and Dissertations
Language:English
Published: 2011
Subjects:
Online Access:https://hdl.handle.net/10356/46279
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-46279
record_format dspace
spelling sg-ntu-dr.10356-462792023-03-04T00:47:57Z Unifide framework for speaker-aware isolated word recognition George Rosario Dhinesh Thambipillai Srikanthan School of Computer Engineering Centre for High Performance Embedded Systems DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition The explosive growth of various kinds of personal electronic devices in recent years has spawned substantial interest in personalized voice-based human device interaction. There exists a need for robust and computationally-efficient techniques to help realize mobile and embedded computing applications that are capable of recognizing spoken words and the speaker who uttered them. Although spoken word recognition and speaker recognition are closely related problems with a number of commonalities, separate and different techniques are employed for solving them in the current state of the art. This thesis presents the research, development and prototyping of a speaker-aware isolated word recognition system based on a single, low-complexity technique suitable for resource-constrained mobile and embedded devices. A comprehensive literature survey has been carried out to study and evaluate the suitability of several existing techniques for embedded speaker-and-word recognition. Based on qualitative and performance analyses available in the literature, a framework based on Mel Frequency Cepstral Coefficients (MFCC) and Gaussian Mixture Model (GMM) has been chosen as the base for our work. An evaluation platform that is rapidly configurable according to the desired values of the parameters involved in the GMM process has been developed in order to expedite the experimentation process. The challenging problem of recognizing a speaker based on a single utterance of very short duration has been examined in detail. The effectiveness of GMM-based text-dependent and text-constrained speaker recognition approaches has been evaluated on the TI46 speech corpus resulting in a recognition accuracy of 99.28% and 96.6% respectively. We have proposed and evaluated a method of grouping similar sub-word units in text-constrained speaker recognition and obtained a recognition rate of 96.62%. A novel technique has been proposed in order to overcome the inability of GMM to retain the temporal information of the speech in word recognition. This technique relies on modeling a word as a time-ordered sequence of GMMs, where each GMM corresponds to a sub-word unit, so that the sequence of the sub-words is maintained. MASTER OF ENGINEERING (SCE) 2011-11-28T04:36:04Z 2011-11-28T04:36:04Z 2011 2011 Thesis George, R. D. (2011). Unifide framework for speaker-aware isolated word recognition. Master’s thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/46279 10.32657/10356/46279 en 133 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
George Rosario Dhinesh
Unifide framework for speaker-aware isolated word recognition
description The explosive growth of various kinds of personal electronic devices in recent years has spawned substantial interest in personalized voice-based human device interaction. There exists a need for robust and computationally-efficient techniques to help realize mobile and embedded computing applications that are capable of recognizing spoken words and the speaker who uttered them. Although spoken word recognition and speaker recognition are closely related problems with a number of commonalities, separate and different techniques are employed for solving them in the current state of the art. This thesis presents the research, development and prototyping of a speaker-aware isolated word recognition system based on a single, low-complexity technique suitable for resource-constrained mobile and embedded devices. A comprehensive literature survey has been carried out to study and evaluate the suitability of several existing techniques for embedded speaker-and-word recognition. Based on qualitative and performance analyses available in the literature, a framework based on Mel Frequency Cepstral Coefficients (MFCC) and Gaussian Mixture Model (GMM) has been chosen as the base for our work. An evaluation platform that is rapidly configurable according to the desired values of the parameters involved in the GMM process has been developed in order to expedite the experimentation process. The challenging problem of recognizing a speaker based on a single utterance of very short duration has been examined in detail. The effectiveness of GMM-based text-dependent and text-constrained speaker recognition approaches has been evaluated on the TI46 speech corpus resulting in a recognition accuracy of 99.28% and 96.6% respectively. We have proposed and evaluated a method of grouping similar sub-word units in text-constrained speaker recognition and obtained a recognition rate of 96.62%. A novel technique has been proposed in order to overcome the inability of GMM to retain the temporal information of the speech in word recognition. This technique relies on modeling a word as a time-ordered sequence of GMMs, where each GMM corresponds to a sub-word unit, so that the sequence of the sub-words is maintained.
author2 Thambipillai Srikanthan
author_facet Thambipillai Srikanthan
George Rosario Dhinesh
format Theses and Dissertations
author George Rosario Dhinesh
author_sort George Rosario Dhinesh
title Unifide framework for speaker-aware isolated word recognition
title_short Unifide framework for speaker-aware isolated word recognition
title_full Unifide framework for speaker-aware isolated word recognition
title_fullStr Unifide framework for speaker-aware isolated word recognition
title_full_unstemmed Unifide framework for speaker-aware isolated word recognition
title_sort unifide framework for speaker-aware isolated word recognition
publishDate 2011
url https://hdl.handle.net/10356/46279
_version_ 1759854584414601216