DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING

When there is interference in communication, lip reading is a technique that can be used by humans to read the speaker’s lips. This technique is difficult to master in a silent environment, so lots of research has been done to build automatic lip reading systems using deep learning, especially in...

Full description

Saved in:

Bibliographic Details
Main Author:	Rahim, Annisa
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/72579
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:72579
spelling	id-itb.:725792023-04-17T11:16:29ZDEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING Rahim, Annisa Indonesia Final Project lip reading, word classification, deep learning, Indonesian language INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/72579 When there is interference in communication, lip reading is a technique that can be used by humans to read the speaker’s lips. This technique is difficult to master in a silent environment, so lots of research has been done to build automatic lip reading systems using deep learning, especially in English. In Indonesian, the biggest obstacle is the limited dataset, with currently the largest dataset named AVID, containing command sentences. This dataset has a limited sentence structure (command + object + color + preposition + letter + digit) with 51 variations of words. Related researches were conducted at the sentence level with very limited predictions, based on the sentence structure from the dataset. This research tries to develop an Indonesian lip reading system with freer sentence structure. The approach is done at word level, using AVID which is pre-processed into 2550 video with 51 word labels. Videos are loaded into the preprocessing flow in the form of a sequence of frames, to enter facial landmarks, lip region cropping, and frame padding. The prediction system uses a two-stage flow inspired by OCR: the first stage contains a word-type classification, and the second stage contains six word classification models for each word-type. For each model, the architecture used is 3D Conv, ResNet (front-end) and MS-TCN (back-end). The final model uses face alignment in the preprocessing process, with 72.9% accuracy of the first stage model (word-type), and the second stage models of command, color, object, preposition, letter, and digit respectively 85%, 83.3%, 92.5%, 57.5%, 41.5%, and 84%. The final system accuracy is 40.78%, decreased because the first stage model had overfit on certain labels due to data imbalance. In one test, the system had a model load speed of 1 minute 27 seconds and the prediction speed of 4.05 FPS. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	When there is interference in communication, lip reading is a technique that can be used by humans to read the speaker’s lips. This technique is difficult to master in a silent environment, so lots of research has been done to build automatic lip reading systems using deep learning, especially in English. In Indonesian, the biggest obstacle is the limited dataset, with currently the largest dataset named AVID, containing command sentences. This dataset has a limited sentence structure (command + object + color + preposition + letter + digit) with 51 variations of words. Related researches were conducted at the sentence level with very limited predictions, based on the sentence structure from the dataset. This research tries to develop an Indonesian lip reading system with freer sentence structure. The approach is done at word level, using AVID which is pre-processed into 2550 video with 51 word labels. Videos are loaded into the preprocessing flow in the form of a sequence of frames, to enter facial landmarks, lip region cropping, and frame padding. The prediction system uses a two-stage flow inspired by OCR: the first stage contains a word-type classification, and the second stage contains six word classification models for each word-type. For each model, the architecture used is 3D Conv, ResNet (front-end) and MS-TCN (back-end). The final model uses face alignment in the preprocessing process, with 72.9% accuracy of the first stage model (word-type), and the second stage models of command, color, object, preposition, letter, and digit respectively 85%, 83.3%, 92.5%, 57.5%, 41.5%, and 84%. The final system accuracy is 40.78%, decreased because the first stage model had overfit on certain labels due to data imbalance. In one test, the system had a model load speed of 1 minute 27 seconds and the prediction speed of 4.05 FPS.
format	Final Project
author	Rahim, Annisa
spellingShingle	Rahim, Annisa DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING
author_facet	Rahim, Annisa
author_sort	Rahim, Annisa
title	DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING
title_short	DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING
title_full	DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING
title_fullStr	DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING
title_full_unstemmed	DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING
title_sort	development of lip reading system based on indonesian word classification using deep learning
url	https://digilib.itb.ac.id/gdl/view/72579
_version_	1822279374395670528

DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING

Similar Items