A lip geometry approach for feature-fusion based audio-visual speech recognition

This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. By defining a small number of highly descri...

Full description

Saved in:

Bibliographic Details
Main Authors:	M. Z., Ibrahim, Mulvaney, D. J.
Format:	Conference or Workshop Item
Language:	English English
Published:	IEEE 2014
Subjects:	TK Electrical engineering. Electronics Nuclear engineering
Online Access:	http://umpir.ump.edu.my/id/eprint/29900/1/A%20lip%20geometry%20approach%20for%20feature-fusion%20based%20audio.pdf http://umpir.ump.edu.my/id/eprint/29900/2/A%20lip%20geometry%20approach%20for%20feature-fusion%20based%20audio_FULL.pdf http://umpir.ump.edu.my/id/eprint/29900/ https://doi.org/10.1109/ISCCSP.2014.6877957
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Malaysia Pahang Al-Sultan Abdullah
Language:	English English

Description
Summary:	This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. By defining a small number of highly descriptive geometrical features relevant to the recognition task, the approach avoids the poor scalability (termed the `curse of dimensionality') that is often associated with featurefusion AVSR methods. The paper describes comparisons of the new approach with conventional appearance-based methods, namely the discrete cosine transform and the principal component analysis techniques, when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the geometrical method significantly improves speech recognition accuracy compared with appearance-based approaches, despite the new method requiring significantly fewer features.

A lip geometry approach for feature-fusion based audio-visual speech recognition

Similar Items