Recognition of visual speech elements using adaptively boosted hidden Markov models

The performance of automatic speech recognition (ASR) system can be significantly enhanced with additional information from visual speech elements such as the movement of lips, tongue, and teeth, especially under noisy environment. In this paper, a novel approach for recognition of vis...

Full description

Saved in:
Bibliographic Details
Main Authors: Foo, Say Wei, Lian, Yong, Dong, Liang
Format: Article
Language:English
Published: 2009
Online Access:https://hdl.handle.net/10356/91495
http://hdl.handle.net/10220/4584
http://sfxna09.hosted.exlibrisgroup.com:3410/ntu/sfxlcl3?sid=metalib:EVII&id=doi:10.1109/TCSVT.2004.826773&genre=&isbn=&issn=10518215&date=2004&volume=14&issue=5&spage=693&epage=705&aulast=Foo&aufirst=%20Say%20Wei&auinit=&title=IEEE%20Transactions%20on%20Circuits%20and%20Systems%20for%20Video%20Technology&atitle=Recognition%20of%20visual%20speech%20elements%20using%20adaptively%20boosted%20hidden%20markov%20models
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The performance of automatic speech recognition (ASR) system can be significantly enhanced with additional information from visual speech elements such as the movement of lips, tongue, and teeth, especially under noisy environment. In this paper, a novel approach for recognition of visual speech elements is presented. The approach makes use of adaptive boosting (AdaBoost) and hidden Markov models (HMMs) to build an AdaBoost-HMM classifier. The composite HMMs of the AdaBoost-HMM classifier are trained to cover different groups of training samples using the AdaBoost technique and the biased Baum–Welch training method. By combining the decisions of the component classifiers of the composite HMMs according to a novel probability synthesis rule, a more complex decision boundary is formulated than using the single HMM classifier. The method is applied to the recognition of the basic visual speech elements. Experimental results show that the AdaBoost-HMM classifier outperforms the traditional HMM classifier in accuracy, especially for visemes extracted from contexts.