DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION

Ambiguity, particularly structural ambiguity, is one of the challenges in natural language that is still overlooked by most Indonesian speech recognition systems. No speech recognition system has utilized prosodic information to address structural ambiguity. Therefore, this study develops the first...

Full description

Saved in:
Bibliographic Details
Main Author: Faradishi Widiaputri, Ruhiyah
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/75260
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:75260
spelling id-itb.:752602023-07-26T11:04:46ZDISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION Faradishi Widiaputri, Ruhiyah Indonesia Final Project structurally ambiguous sentences, prosody, speech recognition, speech-to-text translation INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/75260 Ambiguity, particularly structural ambiguity, is one of the challenges in natural language that is still overlooked by most Indonesian speech recognition systems. No speech recognition system has utilized prosodic information to address structural ambiguity. Therefore, this study develops the first system capable of disambiguating structurally ambiguous utterances into unambiguous interpretation texts in Indonesian by using prosodic speech information from the utterances. The contributions of this study include the construction of a structurally ambiguous speech corpus and an Indonesian speech disambiguation system. The corpus creation process involves generating structurally ambiguous sentences along with their two interpretations and recording speech. Two prosodic cues used for the disambiguation were pause and pitch, with the features used to store pauses being mel-spectrogram and energy and F0 for pitch. The disambiguation systems were built by adapting both cascade and direct approaches to speech-to-text mapping, specifically the task of speech-to-text translation systems, using the Transformer framework. The cascade approach comprises an ASR system and a new model called the Text Disambiguation (TD) model, while the direct approach consists of a new model called the Speech Disambiguation (SD) model. The construction of the corpus results in 400 structurally ambiguous sentences and 4800 structurally ambiguous utterances in Indonesian. The research findings demonstrate that the constructed disambiguation systems can produce fairly accurate interpretation texts. The best-performing system in this study is the direct approach with mel-spectrogram concatenated with F0 and energy as audio input, which achieved an average disambiguation accuracy of 82.2%. The best cascade system, which adds meaning tags and uses the same input combination, delivers slightly worse performance with an average disambiguation accuracy of 79.6%. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Ambiguity, particularly structural ambiguity, is one of the challenges in natural language that is still overlooked by most Indonesian speech recognition systems. No speech recognition system has utilized prosodic information to address structural ambiguity. Therefore, this study develops the first system capable of disambiguating structurally ambiguous utterances into unambiguous interpretation texts in Indonesian by using prosodic speech information from the utterances. The contributions of this study include the construction of a structurally ambiguous speech corpus and an Indonesian speech disambiguation system. The corpus creation process involves generating structurally ambiguous sentences along with their two interpretations and recording speech. Two prosodic cues used for the disambiguation were pause and pitch, with the features used to store pauses being mel-spectrogram and energy and F0 for pitch. The disambiguation systems were built by adapting both cascade and direct approaches to speech-to-text mapping, specifically the task of speech-to-text translation systems, using the Transformer framework. The cascade approach comprises an ASR system and a new model called the Text Disambiguation (TD) model, while the direct approach consists of a new model called the Speech Disambiguation (SD) model. The construction of the corpus results in 400 structurally ambiguous sentences and 4800 structurally ambiguous utterances in Indonesian. The research findings demonstrate that the constructed disambiguation systems can produce fairly accurate interpretation texts. The best-performing system in this study is the direct approach with mel-spectrogram concatenated with F0 and energy as audio input, which achieved an average disambiguation accuracy of 82.2%. The best cascade system, which adds meaning tags and uses the same input combination, delivers slightly worse performance with an average disambiguation accuracy of 79.6%.
format Final Project
author Faradishi Widiaputri, Ruhiyah
spellingShingle Faradishi Widiaputri, Ruhiyah
DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION
author_facet Faradishi Widiaputri, Ruhiyah
author_sort Faradishi Widiaputri, Ruhiyah
title DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION
title_short DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION
title_full DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION
title_fullStr DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION
title_full_unstemmed DISAMBIGUATION OF STRUCTURAL AMBIGUITY IN INDONESIAN SPEECH BY UTILIZING PROSODIC INFORMATION BASED ON TRANSFORMER FRAMEWORKS FOR SPEECH-TO-TEXT TRANSLATION
title_sort disambiguation of structural ambiguity in indonesian speech by utilizing prosodic information based on transformer frameworks for speech-to-text translation
url https://digilib.itb.ac.id/gdl/view/75260
_version_ 1822994294706798592