DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING

When there is interference in communication, lip reading is a technique that can be used by humans to read the speaker’s lips. This technique is difficult to master in a silent environment, so lots of research has been done to build automatic lip reading systems using deep learning, especially in...

Full description

Saved in:
Bibliographic Details
Main Author: Rahim, Annisa
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/72579
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:72579
spelling id-itb.:725792023-04-17T11:16:29ZDEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING Rahim, Annisa Indonesia Final Project lip reading, word classification, deep learning, Indonesian language INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/72579 When there is interference in communication, lip reading is a technique that can be used by humans to read the speaker’s lips. This technique is difficult to master in a silent environment, so lots of research has been done to build automatic lip reading systems using deep learning, especially in English. In Indonesian, the biggest obstacle is the limited dataset, with currently the largest dataset named AVID, containing command sentences. This dataset has a limited sentence structure (command + object + color + preposition + letter + digit) with 51 variations of words. Related researches were conducted at the sentence level with very limited predictions, based on the sentence structure from the dataset. This research tries to develop an Indonesian lip reading system with freer sentence structure. The approach is done at word level, using AVID which is pre-processed into 2550 video with 51 word labels. Videos are loaded into the preprocessing flow in the form of a sequence of frames, to enter facial landmarks, lip region cropping, and frame padding. The prediction system uses a two-stage flow inspired by OCR: the first stage contains a word-type classification, and the second stage contains six word classification models for each word-type. For each model, the architecture used is 3D Conv, ResNet (front-end) and MS-TCN (back-end). The final model uses face alignment in the preprocessing process, with 72.9% accuracy of the first stage model (word-type), and the second stage models of command, color, object, preposition, letter, and digit respectively 85%, 83.3%, 92.5%, 57.5%, 41.5%, and 84%. The final system accuracy is 40.78%, decreased because the first stage model had overfit on certain labels due to data imbalance. In one test, the system had a model load speed of 1 minute 27 seconds and the prediction speed of 4.05 FPS. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description When there is interference in communication, lip reading is a technique that can be used by humans to read the speaker’s lips. This technique is difficult to master in a silent environment, so lots of research has been done to build automatic lip reading systems using deep learning, especially in English. In Indonesian, the biggest obstacle is the limited dataset, with currently the largest dataset named AVID, containing command sentences. This dataset has a limited sentence structure (command + object + color + preposition + letter + digit) with 51 variations of words. Related researches were conducted at the sentence level with very limited predictions, based on the sentence structure from the dataset. This research tries to develop an Indonesian lip reading system with freer sentence structure. The approach is done at word level, using AVID which is pre-processed into 2550 video with 51 word labels. Videos are loaded into the preprocessing flow in the form of a sequence of frames, to enter facial landmarks, lip region cropping, and frame padding. The prediction system uses a two-stage flow inspired by OCR: the first stage contains a word-type classification, and the second stage contains six word classification models for each word-type. For each model, the architecture used is 3D Conv, ResNet (front-end) and MS-TCN (back-end). The final model uses face alignment in the preprocessing process, with 72.9% accuracy of the first stage model (word-type), and the second stage models of command, color, object, preposition, letter, and digit respectively 85%, 83.3%, 92.5%, 57.5%, 41.5%, and 84%. The final system accuracy is 40.78%, decreased because the first stage model had overfit on certain labels due to data imbalance. In one test, the system had a model load speed of 1 minute 27 seconds and the prediction speed of 4.05 FPS.
format Final Project
author Rahim, Annisa
spellingShingle Rahim, Annisa
DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING
author_facet Rahim, Annisa
author_sort Rahim, Annisa
title DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING
title_short DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING
title_full DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING
title_fullStr DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING
title_full_unstemmed DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING
title_sort development of lip reading system based on indonesian word classification using deep learning
url https://digilib.itb.ac.id/gdl/view/72579
_version_ 1822279374395670528