DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING
When there is interference in communication, lip reading is a technique that can be used by humans to read the speaker’s lips. This technique is difficult to master in a silent environment, so lots of research has been done to build automatic lip reading systems using deep learning, especially in...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/72579 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:72579 |
---|---|
spelling |
id-itb.:725792023-04-17T11:16:29ZDEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING Rahim, Annisa Indonesia Final Project lip reading, word classification, deep learning, Indonesian language INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/72579 When there is interference in communication, lip reading is a technique that can be used by humans to read the speaker’s lips. This technique is difficult to master in a silent environment, so lots of research has been done to build automatic lip reading systems using deep learning, especially in English. In Indonesian, the biggest obstacle is the limited dataset, with currently the largest dataset named AVID, containing command sentences. This dataset has a limited sentence structure (command + object + color + preposition + letter + digit) with 51 variations of words. Related researches were conducted at the sentence level with very limited predictions, based on the sentence structure from the dataset. This research tries to develop an Indonesian lip reading system with freer sentence structure. The approach is done at word level, using AVID which is pre-processed into 2550 video with 51 word labels. Videos are loaded into the preprocessing flow in the form of a sequence of frames, to enter facial landmarks, lip region cropping, and frame padding. The prediction system uses a two-stage flow inspired by OCR: the first stage contains a word-type classification, and the second stage contains six word classification models for each word-type. For each model, the architecture used is 3D Conv, ResNet (front-end) and MS-TCN (back-end). The final model uses face alignment in the preprocessing process, with 72.9% accuracy of the first stage model (word-type), and the second stage models of command, color, object, preposition, letter, and digit respectively 85%, 83.3%, 92.5%, 57.5%, 41.5%, and 84%. The final system accuracy is 40.78%, decreased because the first stage model had overfit on certain labels due to data imbalance. In one test, the system had a model load speed of 1 minute 27 seconds and the prediction speed of 4.05 FPS. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
When there is interference in communication, lip reading is a technique that can be
used by humans to read the speaker’s lips. This technique is difficult to master in a
silent environment, so lots of research has been done to build automatic lip reading
systems using deep learning, especially in English. In Indonesian, the biggest
obstacle is the limited dataset, with currently the largest dataset named AVID,
containing command sentences. This dataset has a limited sentence structure
(command + object + color + preposition + letter + digit) with 51 variations of
words. Related researches were conducted at the sentence level with very limited
predictions, based on the sentence structure from the dataset.
This research tries to develop an Indonesian lip reading system with freer sentence
structure. The approach is done at word level, using AVID which is pre-processed
into 2550 video with 51 word labels. Videos are loaded into the preprocessing flow
in the form of a sequence of frames, to enter facial landmarks, lip region cropping,
and frame padding. The prediction system uses a two-stage flow inspired by OCR:
the first stage contains a word-type classification, and the second stage contains six
word classification models for each word-type. For each model, the architecture
used is 3D Conv, ResNet (front-end) and MS-TCN (back-end).
The final model uses face alignment in the preprocessing process, with 72.9%
accuracy of the first stage model (word-type), and the second stage models of
command, color, object, preposition, letter, and digit respectively 85%, 83.3%,
92.5%, 57.5%, 41.5%, and 84%. The final system accuracy is 40.78%, decreased
because the first stage model had overfit on certain labels due to data imbalance. In
one test, the system had a model load speed of 1 minute 27 seconds and the
prediction speed of 4.05 FPS. |
format |
Final Project |
author |
Rahim, Annisa |
spellingShingle |
Rahim, Annisa DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING |
author_facet |
Rahim, Annisa |
author_sort |
Rahim, Annisa |
title |
DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING |
title_short |
DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING |
title_full |
DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING |
title_fullStr |
DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING |
title_full_unstemmed |
DEVELOPMENT OF LIP READING SYSTEM BASED ON INDONESIAN WORD CLASSIFICATION USING DEEP LEARNING |
title_sort |
development of lip reading system based on indonesian word classification using deep learning |
url |
https://digilib.itb.ac.id/gdl/view/72579 |
_version_ |
1822279374395670528 |