Automatic visual speech recognition

One of the most challenging tasks in automatic visual speech recognition is the extraction of feature parameters from image sequences of lips. There are primarily two approaches to extract visual speech information from image sequences, i.e. model-based approach and pixel-based approach. The advanta...

Full description

Saved in:

Bibliographic Details
Main Authors:	Irwan Widjojo, Lee, Kean Hin
Other Authors:	Foo Say Wei
Format:	Final Year Project
Language:	English
Published:	2016
Subjects:	DRNTU::Engineering::Electrical and electronic engineering
Online Access:	http://hdl.handle.net/10356/68997
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-68997
record_format	dspace
spelling	sg-ntu-dr.10356-689972023-07-07T17:46:23Z Automatic visual speech recognition Irwan Widjojo Lee, Kean Hin Foo Say Wei School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering One of the most challenging tasks in automatic visual speech recognition is the extraction of feature parameters from image sequences of lips. There are primarily two approaches to extract visual speech information from image sequences, i.e. model-based approach and pixel-based approach. The advantage of mode1-based approach is that the parameters of the contour model of the lip are less influenced by the variability of lighting condition, lip location and rotation but the construction of an efficient and yet robust lip contour that is capable of tracking the lip has made this task difficult. The pixel-based approach on the other hand must take the variability of lighting condition, lip rotation and location into account. Despite many researches undertaken, lip tracking remains a challenging task due to the diverse variation of face images. The pixel based approach was adopted in this project. Raw data for visual speech recognition were obtained using digital camcorder. These video recordings were converted to image sequences and the lip of the speaker on each frame was extracted. The lip boundaries were obtained after the lip on each frame was located. The contour of the lip was drawn based on the lip boundaries using least square polynomial. Ten important visual speech features for all frames were extracted and then quantized. These vector sequences were ready to be used for training of HMMs. The trained models were used for recognition of unknown vector sequences. Bachelor of Engineering 2016-08-23T04:47:08Z 2016-08-23T04:47:08Z 2016 Final Year Project (FYP) http://hdl.handle.net/10356/68997 en Nanyang Technological University 93 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering Irwan Widjojo Lee, Kean Hin Automatic visual speech recognition
description	One of the most challenging tasks in automatic visual speech recognition is the extraction of feature parameters from image sequences of lips. There are primarily two approaches to extract visual speech information from image sequences, i.e. model-based approach and pixel-based approach. The advantage of mode1-based approach is that the parameters of the contour model of the lip are less influenced by the variability of lighting condition, lip location and rotation but the construction of an efficient and yet robust lip contour that is capable of tracking the lip has made this task difficult. The pixel-based approach on the other hand must take the variability of lighting condition, lip rotation and location into account. Despite many researches undertaken, lip tracking remains a challenging task due to the diverse variation of face images. The pixel based approach was adopted in this project. Raw data for visual speech recognition were obtained using digital camcorder. These video recordings were converted to image sequences and the lip of the speaker on each frame was extracted. The lip boundaries were obtained after the lip on each frame was located. The contour of the lip was drawn based on the lip boundaries using least square polynomial. Ten important visual speech features for all frames were extracted and then quantized. These vector sequences were ready to be used for training of HMMs. The trained models were used for recognition of unknown vector sequences.
author2	Foo Say Wei
author_facet	Foo Say Wei Irwan Widjojo Lee, Kean Hin
format	Final Year Project
author	Irwan Widjojo Lee, Kean Hin
author_sort	Irwan Widjojo
title	Automatic visual speech recognition
title_short	Automatic visual speech recognition
title_full	Automatic visual speech recognition
title_fullStr	Automatic visual speech recognition
title_full_unstemmed	Automatic visual speech recognition
title_sort	automatic visual speech recognition
publishDate	2016
url	http://hdl.handle.net/10356/68997
_version_	1772826623304794112

Automatic visual speech recognition

Similar Items