CONSTRUCTION OF RUDE-WORDS DETECTION SYSTEM FOR INDONESIAN SPEECH

<p align="justify"> The development of technology has made people easier to create and access various information as video. Video can be used to deliver information into vast society through broadcasts and websites, for example movie and news. Unfortunately, not only contain necessar...

Full description

Saved in:
Bibliographic Details
Main Author: NOVITASARI (NIM: 13514027), SASHI
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/30833
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:30833
spelling id-itb.:308332018-06-26T11:04:13Z CONSTRUCTION OF RUDE-WORDS DETECTION SYSTEM FOR INDONESIAN SPEECH NOVITASARI (NIM: 13514027), SASHI Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/30833 <p align="justify"> The development of technology has made people easier to create and access various information as video. Video can be used to deliver information into vast society through broadcasts and websites, for example movie and news. Unfortunately, not only contain necessary information, it also may contain inappropriate content that can affect people’s behavior. The possible negative influence of inappropriate content has created the need to control the distribution and content of video. One of the method to achieve it is content control through censorship. In video censorship, one of things that need to be done is searching for inappropriate words such as rude-words in video’s speech audio. Unfortunately, current censorship method, including content checking, has disadvantage regarding time. As one of solutions to this problem, this research constructed a rude-words detection system for speech audio, mainly for Indonesian speech. The constructed system can be used in censorship process to search rude-words in speech, so the time needed for censoring video or speech can be shorten. <br /> <br /> Constructed system implements machine learning model to detect the rude-words. This model classify words in sentence and sentences, depending on used features. During the construction, the experimented learning algorithm includes SVM and NN which is FFNN, LSTM, and Bi-LSTM. There are two kinds of features that used in the experiment, those are textual features from speech transcription and acoustic features from speech audio. The utilized textual features consist of word-embedding, POS-tag, TF-IDF, wordlist, sentence-embedding, and N-gram. Meanwhile, acoustic features that utilized in experiment consist of pitch, MFCC, INTERSPEECH 2009 feature set, and INTERSPEECH 2010 feature set. Result of the experiment showed that FFNN model which utilized word-embedding, trigram POS-tag, TF-IDF, wordlist, sentence-embedding, and MFCC achieved better performance compared to other models. This model has word classification performance as F1-score 87.80%. <p align="justify"> <br /> text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description <p align="justify"> The development of technology has made people easier to create and access various information as video. Video can be used to deliver information into vast society through broadcasts and websites, for example movie and news. Unfortunately, not only contain necessary information, it also may contain inappropriate content that can affect people’s behavior. The possible negative influence of inappropriate content has created the need to control the distribution and content of video. One of the method to achieve it is content control through censorship. In video censorship, one of things that need to be done is searching for inappropriate words such as rude-words in video’s speech audio. Unfortunately, current censorship method, including content checking, has disadvantage regarding time. As one of solutions to this problem, this research constructed a rude-words detection system for speech audio, mainly for Indonesian speech. The constructed system can be used in censorship process to search rude-words in speech, so the time needed for censoring video or speech can be shorten. <br /> <br /> Constructed system implements machine learning model to detect the rude-words. This model classify words in sentence and sentences, depending on used features. During the construction, the experimented learning algorithm includes SVM and NN which is FFNN, LSTM, and Bi-LSTM. There are two kinds of features that used in the experiment, those are textual features from speech transcription and acoustic features from speech audio. The utilized textual features consist of word-embedding, POS-tag, TF-IDF, wordlist, sentence-embedding, and N-gram. Meanwhile, acoustic features that utilized in experiment consist of pitch, MFCC, INTERSPEECH 2009 feature set, and INTERSPEECH 2010 feature set. Result of the experiment showed that FFNN model which utilized word-embedding, trigram POS-tag, TF-IDF, wordlist, sentence-embedding, and MFCC achieved better performance compared to other models. This model has word classification performance as F1-score 87.80%. <p align="justify"> <br />
format Final Project
author NOVITASARI (NIM: 13514027), SASHI
spellingShingle NOVITASARI (NIM: 13514027), SASHI
CONSTRUCTION OF RUDE-WORDS DETECTION SYSTEM FOR INDONESIAN SPEECH
author_facet NOVITASARI (NIM: 13514027), SASHI
author_sort NOVITASARI (NIM: 13514027), SASHI
title CONSTRUCTION OF RUDE-WORDS DETECTION SYSTEM FOR INDONESIAN SPEECH
title_short CONSTRUCTION OF RUDE-WORDS DETECTION SYSTEM FOR INDONESIAN SPEECH
title_full CONSTRUCTION OF RUDE-WORDS DETECTION SYSTEM FOR INDONESIAN SPEECH
title_fullStr CONSTRUCTION OF RUDE-WORDS DETECTION SYSTEM FOR INDONESIAN SPEECH
title_full_unstemmed CONSTRUCTION OF RUDE-WORDS DETECTION SYSTEM FOR INDONESIAN SPEECH
title_sort construction of rude-words detection system for indonesian speech
url https://digilib.itb.ac.id/gdl/view/30833
_version_ 1821995877387993088