Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban

This paper presents our strategies for developing an automatic speech recognition system for Iban, an under-resourced language. We faced several challenges such as no pronunciation dictionary and lack of training material for building acoustic models. To overcome these problems, we proposed approach...

Full description

Saved in:
Bibliographic Details
Main Authors: Juan, Sarah Samson, Besacier, Laurent, Lecouteux, Benjamin, Dyab, Mohamed
Format: Conference or Workshop Item
Language:English
Published: 2015
Subjects:
Online Access:http://ir.unimas.my/id/eprint/8883/1/IS2015_samsonjuan_camera-ready.pdf
http://ir.unimas.my/id/eprint/8883/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Sarawak
Language: English
id my.unimas.ir.8883
record_format eprints
spelling my.unimas.ir.88832015-10-16T01:23:21Z http://ir.unimas.my/id/eprint/8883/ Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban Juan, Sarah Samson Besacier, Laurent Lecouteux, Benjamin Dyab, Mohamed QA75 Electronic computers. Computer science This paper presents our strategies for developing an automatic speech recognition system for Iban, an under-resourced language. We faced several challenges such as no pronunciation dictionary and lack of training material for building acoustic models. To overcome these problems, we proposed approaches which exploit resources from a closely-related language (Malay). We developed a semi-supervised method for building the pronunciation dictionary and applied cross-lingual strategies for improving acoustic models trained with very limited training data. Both approaches displayed very encouraging results, which show that data from a closely-related language, if available, can be exploited to build ASR for a new language. In the final part of the paper, we present a zero-shot ASR using Malay resources that can be used as an alternative method for transcribing Iban speech. 2015-09 Conference or Workshop Item PeerReviewed text en http://ir.unimas.my/id/eprint/8883/1/IS2015_samsonjuan_camera-ready.pdf Juan, Sarah Samson and Besacier, Laurent and Lecouteux, Benjamin and Dyab, Mohamed (2015) Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban. In: Proceedings of INTERSPEECH 2015, September 2015, Dresden, Germany.
institution Universiti Malaysia Sarawak
building Centre for Academic Information Services (CAIS)
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Sarawak
content_source UNIMAS Institutional Repository
url_provider http://ir.unimas.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Juan, Sarah Samson
Besacier, Laurent
Lecouteux, Benjamin
Dyab, Mohamed
Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban
description This paper presents our strategies for developing an automatic speech recognition system for Iban, an under-resourced language. We faced several challenges such as no pronunciation dictionary and lack of training material for building acoustic models. To overcome these problems, we proposed approaches which exploit resources from a closely-related language (Malay). We developed a semi-supervised method for building the pronunciation dictionary and applied cross-lingual strategies for improving acoustic models trained with very limited training data. Both approaches displayed very encouraging results, which show that data from a closely-related language, if available, can be exploited to build ASR for a new language. In the final part of the paper, we present a zero-shot ASR using Malay resources that can be used as an alternative method for transcribing Iban speech.
format Conference or Workshop Item
author Juan, Sarah Samson
Besacier, Laurent
Lecouteux, Benjamin
Dyab, Mohamed
author_facet Juan, Sarah Samson
Besacier, Laurent
Lecouteux, Benjamin
Dyab, Mohamed
author_sort Juan, Sarah Samson
title Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban
title_short Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban
title_full Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban
title_fullStr Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban
title_full_unstemmed Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban
title_sort using resources from a closely-related language to develop asr for a very under-resourced language: a case study for iban
publishDate 2015
url http://ir.unimas.my/id/eprint/8883/1/IS2015_samsonjuan_camera-ready.pdf
http://ir.unimas.my/id/eprint/8883/
_version_ 1644510621865082880