DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION

Ambiguity, particularly structural ambiguity, is one of the challenges in natural language that is still overlooked by most Indonesian speech recognition systems. No speech recognition system has utilized prosodic information to address structural ambiguity. Therefore, this study develops the first...

Full description

Saved in:

Bibliographic Details
Main Author:	Faradishi Widiaputri, Ruhiyah
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/75260
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:75260
spelling	id-itb.:752602023-07-26T11:04:46ZDISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION Faradishi Widiaputri, Ruhiyah Indonesia Final Project structurally ambiguous sentences, prosody, speech recognition, speech-to-text translation INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/75260 Ambiguity, particularly structural ambiguity, is one of the challenges in natural language that is still overlooked by most Indonesian speech recognition systems. No speech recognition system has utilized prosodic information to address structural ambiguity. Therefore, this study develops the first system capable of disambiguating structurally ambiguous utterances into unambiguous interpretation texts in Indonesian by using prosodic speech information from the utterances. The contributions of this study include the construction of a structurally ambiguous speech corpus and an Indonesian speech disambiguation system. The corpus creation process involves generating structurally ambiguous sentences along with their two interpretations and recording speech. Two prosodic cues used for the disambiguation were pause and pitch, with the features used to store pauses being mel-spectrogram and energy and F0 for pitch. The disambiguation systems were built by adapting both cascade and direct approaches to speech-to-text mapping, specifically the task of speech-to-text translation systems, using the Transformer framework. The cascade approach comprises an ASR system and a new model called the Text Disambiguation (TD) model, while the direct approach consists of a new model called the Speech Disambiguation (SD) model. The construction of the corpus results in 400 structurally ambiguous sentences and 4800 structurally ambiguous utterances in Indonesian. The research findings demonstrate that the constructed disambiguation systems can produce fairly accurate interpretation texts. The best-performing system in this study is the direct approach with mel-spectrogram concatenated with F0 and energy as audio input, which achieved an average disambiguation accuracy of 82.2%. The best cascade system, which adds meaning tags and uses the same input combination, delivers slightly worse performance with an average disambiguation accuracy of 79.6%. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Ambiguity, particularly structural ambiguity, is one of the challenges in natural language that is still overlooked by most Indonesian speech recognition systems. No speech recognition system has utilized prosodic information to address structural ambiguity. Therefore, this study develops the first system capable of disambiguating structurally ambiguous utterances into unambiguous interpretation texts in Indonesian by using prosodic speech information from the utterances. The contributions of this study include the construction of a structurally ambiguous speech corpus and an Indonesian speech disambiguation system. The corpus creation process involves generating structurally ambiguous sentences along with their two interpretations and recording speech. Two prosodic cues used for the disambiguation were pause and pitch, with the features used to store pauses being mel-spectrogram and energy and F0 for pitch. The disambiguation systems were built by adapting both cascade and direct approaches to speech-to-text mapping, specifically the task of speech-to-text translation systems, using the Transformer framework. The cascade approach comprises an ASR system and a new model called the Text Disambiguation (TD) model, while the direct approach consists of a new model called the Speech Disambiguation (SD) model. The construction of the corpus results in 400 structurally ambiguous sentences and 4800 structurally ambiguous utterances in Indonesian. The research findings demonstrate that the constructed disambiguation systems can produce fairly accurate interpretation texts. The best-performing system in this study is the direct approach with mel-spectrogram concatenated with F0 and energy as audio input, which achieved an average disambiguation accuracy of 82.2%. The best cascade system, which adds meaning tags and uses the same input combination, delivers slightly worse performance with an average disambiguation accuracy of 79.6%.
format	Final Project
author	Faradishi Widiaputri, Ruhiyah
spellingShingle	Faradishi Widiaputri, Ruhiyah DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION
author_facet	Faradishi Widiaputri, Ruhiyah
author_sort	Faradishi Widiaputri, Ruhiyah
title	DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION
title_short	DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION
title_full	DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION
title_fullStr	DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION
title_full_unstemmed	DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION
title_sort	disambiguation of structural ambiguity in indonesian speech by utilizing prosodic information based on transformer frameworks for speech-to-text translation
url	https://digilib.itb.ac.id/gdl/view/75260
_version_	1822994294706798592

DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION

Similar Items