Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language
This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-res...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | http://ir.unimas.my/id/eprint/8876/1/wssanlp2013_sarah.pdf http://ir.unimas.my/id/eprint/8876/ http://www.aclweb.org/anthology/W13-4701 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaysia Sarawak |
Language: | English |
id |
my.unimas.ir.8876 |
---|---|
record_format |
eprints |
spelling |
my.unimas.ir.88762015-10-16T01:10:04Z http://ir.unimas.my/id/eprint/8876/ Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language Juan, Sarah Samson Besacier, Laurent QA75 Electronic computers. Computer science T Technology (General) This paper deals with the fast bootstrapping of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition (ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-resourced language (Iban - spoken in Sarawak and in several parts of the Borneo Island) for which no resource nor knowledge is really available. More precisely, a pre-existing Malay G2P is used to produce phoneme sequences of Iban words. The phonemes are then manually post-edited (corrected) by an Iban native. This resource, which has been produced in a semi-supervised fashion, is later used to train the first G2P system for Iban language. As a by-product of this methodology, the analysis of the “pronunciation distance” between Malay and Iban enlighten the phonological and orthographic relations between these two languages. The experiments conducted show that a rather efficient Iban G2P system can be obtained after only two hours of post-edition (correction) of the output of Malay G2P applied to Iban words. 2013-10 Conference or Workshop Item PeerReviewed text en http://ir.unimas.my/id/eprint/8876/1/wssanlp2013_sarah.pdf Juan, Sarah Samson and Besacier, Laurent (2013) Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language. In: Proceedings of 4th Workshop on South and Southeast Asian Natural Language Processing 2013, Nagoya, Japan. http://www.aclweb.org/anthology/W13-4701 |
institution |
Universiti Malaysia Sarawak |
building |
Centre for Academic Information Services (CAIS) |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaysia Sarawak |
content_source |
UNIMAS Institutional Repository |
url_provider |
http://ir.unimas.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science T Technology (General) |
spellingShingle |
QA75 Electronic computers. Computer science T Technology (General) Juan, Sarah Samson Besacier, Laurent Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language |
description |
This paper deals with the fast bootstrapping
of Grapheme-to-Phoneme (G2P) conversion system, which is a key module for both automatic speech recognition
(ASR), and text-to-speech synthesis (TTS). The idea is to exploit language contact between a local dominant language (Malay) and a very under-resourced language (Iban - spoken in Sarawak and in several parts of the Borneo Island) for which no resource nor knowledge is really available. More precisely, a pre-existing Malay G2P is used to produce phoneme sequences of Iban words. The phonemes are then manually post-edited (corrected)
by an Iban native. This resource, which has been produced in a semi-supervised fashion, is later used to train the first G2P system for Iban language. As a by-product of this methodology, the analysis of the “pronunciation distance” between Malay and Iban enlighten the phonological and orthographic relations between these two
languages. The experiments conducted show that a rather efficient Iban G2P system can be obtained after only two hours of post-edition (correction) of the output of Malay G2P applied to Iban words. |
format |
Conference or Workshop Item |
author |
Juan, Sarah Samson Besacier, Laurent |
author_facet |
Juan, Sarah Samson Besacier, Laurent |
author_sort |
Juan, Sarah Samson |
title |
Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language |
title_short |
Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language |
title_full |
Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language |
title_fullStr |
Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language |
title_full_unstemmed |
Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language |
title_sort |
fast bootstrapping of grapheme to phoneme system for under-resourced languages - application to the iban language |
publishDate |
2013 |
url |
http://ir.unimas.my/id/eprint/8876/1/wssanlp2013_sarah.pdf http://ir.unimas.my/id/eprint/8876/ http://www.aclweb.org/anthology/W13-4701 |
_version_ |
1644510620695920640 |