STRUCTURAL AMBIGUITY RESOLUTION IN INDONESIAN-ENGLISH SPEECH-TO-TEXT TRANSLATION SYSTEM BY UTILIZING PROSODIC INFORMATION OF SPEECH
Structural ambiguity, a kind of ambiguity in which a sentence has more than one possible parsing, is one of the problems in natural language that is still overlooked by most current speech-to-text translation systems, resulting in translations that still contain structural ambiguity or are imprec...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/86463 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Structural ambiguity, a kind of ambiguity in which a sentence has more than one
possible parsing, is one of the problems in natural language that is still overlooked
by most current speech-to-text translation systems, resulting in translations that
still contain structural ambiguity or are imprecise. Meanwhile, in speech, there
is prosodic information such as pauses and pitch that can be utilized to address such
structural ambiguities. Therefore, this study develops a structural ambiguity-free
speech-to-text translation system: a speech-to-text translation system that not only
translates speech in one language to text in another language but also handles the
structural ambiguity in the input speech so that it produces a translation that is free
from structural ambiguity by utilizing the prosodic information of the speech.
In this study, a corpus of structural ambiguity-free speech-to-text translation for
Indonesian to English was built by translating a corpus of structural ambiguity
from Indonesian to English. This study then proposes a structural ambiguity-free
speech-to-text translation system by modifying the cascade and direct approaches
in the usual speech-to-text translation system so that it can utilize prosodic
information in producing structural ambiguity-free output. The traditional cascade
approach, which combines automatic speech recognition (ASR) and machine
translation (MT), is modified into three proposed cascades: (1) ASR with additional
meaning tags + text disambiguation (TD) + MT, (2) speech disambiguation (SD)
+ MT, and (3) ASR with additional meaning tags + disambiguation MT (DMT).
Additionally, a direct approach is introduced through the direct speech translation
(DST) model.
The experimental results show that with the proposed approach, the system can
provide a fairly good structural ambiguity-free translation. The best system in
this study, namely the cascade system consisting of ASR with additional tag and
a new model called the disambiguation MT (DMT) model, was able to provide
disambiguation accuracy of up to 78.13%. |
---|