Human robot interaction : speech recognition

The aim of this project is to develop a Speech Recognition System which then be used for human-robot interaction. This system receives speech inputs from users, analyzes the speech inputs by extracting the features of the speech, searches and matches the input speech features with the pre-recorded a...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Roland Rustan.
Other Authors: Lau Wai Shing, Michael
Format: Final Year Project
Language:English
Published: 2011
Subjects:
Online Access:http://hdl.handle.net/10356/42865
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The aim of this project is to develop a Speech Recognition System which then be used for human-robot interaction. This system receives speech inputs from users, analyzes the speech inputs by extracting the features of the speech, searches and matches the input speech features with the pre-recorded and stored speeches features in the trained database/codebook, and returns the best matching result to the users. Developing this system is meant to make an alternative way to interact with robot which is to provide a natural and social-style human-robot interaction. Verbal interaction is very popular in robotics especially in personal assistive robots, which are used to help elderly people and in entertainment robots. This project is limited to playing soccer- related commands, as well as some entertainment purpose, including play music. For Speech Recognition System to work, it needs acoustic models and language models. The acoustic model is a collection of features which are extracted from the pre-recorded speeches. To extract features from the speech signals the Mel-Frequency Cepstral Coefficients (MFCC) algorithm was applied. The language model is a large list of words and their probability of occurrence in a given sequence. For our purpose of project, grammars, the special type of the language model which defines constraints on words that are expected as input, are used. Julius, which is open-source speech recognition software was used in this project to enable human-robot verbal interaction. It was chosen after doing experiment which clearly showed the superiority of Julius compared to CMU-Sphinx 4 in term of accuracy, with average percentage of accuracy up to 84.865%, while CMU-Sphinx 4 is only 79.855%.