STRUCTURAL AMBIGUITY RESOLUTION IN INDONESIAN-ENGLISH SPEECH-TO-TEXT TRANSLATION SYSTEM BY UTILIZING PROSODIC INFORMATION OF SPEECH

Structural ambiguity, a kind of ambiguity in which a sentence has more than one possible parsing, is one of the problems in natural language that is still overlooked by most current speech-to-text translation systems, resulting in translations that still contain structural ambiguity or are imprec...

Full description

Saved in:
Bibliographic Details
Main Author: Faradishi Widiaputri, Ruhiyah
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/86463
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Structural ambiguity, a kind of ambiguity in which a sentence has more than one possible parsing, is one of the problems in natural language that is still overlooked by most current speech-to-text translation systems, resulting in translations that still contain structural ambiguity or are imprecise. Meanwhile, in speech, there is prosodic information such as pauses and pitch that can be utilized to address such structural ambiguities. Therefore, this study develops a structural ambiguity-free speech-to-text translation system: a speech-to-text translation system that not only translates speech in one language to text in another language but also handles the structural ambiguity in the input speech so that it produces a translation that is free from structural ambiguity by utilizing the prosodic information of the speech. In this study, a corpus of structural ambiguity-free speech-to-text translation for Indonesian to English was built by translating a corpus of structural ambiguity from Indonesian to English. This study then proposes a structural ambiguity-free speech-to-text translation system by modifying the cascade and direct approaches in the usual speech-to-text translation system so that it can utilize prosodic information in producing structural ambiguity-free output. The traditional cascade approach, which combines automatic speech recognition (ASR) and machine translation (MT), is modified into three proposed cascades: (1) ASR with additional meaning tags + text disambiguation (TD) + MT, (2) speech disambiguation (SD) + MT, and (3) ASR with additional meaning tags + disambiguation MT (DMT). Additionally, a direct approach is introduced through the direct speech translation (DST) model. The experimental results show that with the proposed approach, the system can provide a fairly good structural ambiguity-free translation. The best system in this study, namely the cascade system consisting of ASR with additional tag and a new model called the disambiguation MT (DMT) model, was able to provide disambiguation accuracy of up to 78.13%.