Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language

This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-res...

Full description

Saved in:
Bibliographic Details
Main Authors: Juan, Sarah Samson, Besacier, Laurent
Format: Conference or Workshop Item
Language:English
Published: 2013
Subjects:
Online Access:http://ir.unimas.my/id/eprint/8876/1/wssanlp2013_sarah.pdf
http://ir.unimas.my/id/eprint/8876/
http://www.aclweb.org/anthology/W13-4701
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Sarawak
Language: English
id my.unimas.ir.8876
record_format eprints
spelling my.unimas.ir.88762015-10-16T01:10:04Z http://ir.unimas.my/id/eprint/8876/ Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language Juan, Sarah Samson Besacier, Laurent QA75 Electronic computers. Computer science T Technology (General) This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-resourced language (Iban - spoken in Sarawak and in several parts of the Borneo Island) for which no resource nor knowledge is really available. More precisely, a pre-existing Malay G2P is used to produce phoneme sequences of Iban words. The phonemes are then manually post-edited (corrected) by an Iban native. This resource, which has been produced in a semi-supervised fashion, is later used to train the first G2P system for Iban language. As a by-product of this methodology, the analysis of the “pronunciation distance” between Malay and Iban enlighten the phonological and orthographic relations between these two languages. The experiments conducted show that a rather efficient Iban G2P system can be obtained after only two hours of post-edition (correction) of the output of Malay G2P applied to Iban words. 2013-10 Conference or Workshop Item PeerReviewed text en http://ir.unimas.my/id/eprint/8876/1/wssanlp2013_sarah.pdf Juan, Sarah Samson and Besacier, Laurent (2013) Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language. In: Proceedings of 4th Workshop on South and Southeast Asian Natural Language Processing 2013, Nagoya, Japan. http://www.aclweb.org/anthology/W13-4701
institution Universiti Malaysia Sarawak
building Centre for Academic Information Services (CAIS)
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sarawak
content_source UNIMAS Institutional Repository
url_provider http://ir.unimas.my/
language English
topic QA75 Electronic computers. Computer science
T Technology (General)
spellingShingle QA75 Electronic computers. Computer science
T Technology (General)
Juan, Sarah Samson
Besacier, Laurent
Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
description This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-resourced language (Iban - spoken in Sarawak and in several parts of the Borneo Island) for which no resource nor knowledge is really available. More precisely, a pre-existing Malay G2P is used to produce phoneme sequences of Iban words. The phonemes are then manually post-edited (corrected) by an Iban native. This resource, which has been produced in a semi-supervised fashion, is later used to train the first G2P system for Iban language. As a by-product of this methodology, the analysis of the “pronunciation distance” between Malay and Iban enlighten the phonological and orthographic relations between these two languages. The experiments conducted show that a rather efficient Iban G2P system can be obtained after only two hours of post-edition (correction) of the output of Malay G2P applied to Iban words.
format Conference or Workshop Item
author Juan, Sarah Samson
Besacier, Laurent
author_facet Juan, Sarah Samson
Besacier, Laurent
author_sort Juan, Sarah Samson
title Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
title_short Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
title_full Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
title_fullStr Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
title_full_unstemmed Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
title_sort fast bootstrapping of grapheme to phoneme system for under-resourced languages - application to the iban language
publishDate 2013
url http://ir.unimas.my/id/eprint/8876/1/wssanlp2013_sarah.pdf
http://ir.unimas.my/id/eprint/8876/
http://www.aclweb.org/anthology/W13-4701
_version_ 1644510620695920640