ANALISYS EFECTIVITY ON THE DEVELOPMENT OF INDONESIAN SPEECH RECOGNITION CORPUS USING ITERATIVE APROACH
<p align="justify">High quality Speech Recognition (SR) system is at least trained with corpus that consists of hundred or more utterances sample with hundred or more speakers. On making corpus for SR system, segmentation is needed to mark speech waveform for each linguistic unit ba...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/10148 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | <p align="justify">High quality Speech Recognition (SR) system is at least trained with corpus that consists of hundred or more utterances sample with hundred or more speakers. On making corpus for SR system, segmentation is needed to mark speech waveform for each linguistic unit based on time unit from all training data files, manually. Therefore developing the high quality corpus will need a lot of resources and time consuming.<p align="justify"><p>One of alternative ways to accelerate the development of high quality corpus is using iterative approach. On this method, small volume of corpus is developed manually. Then, that small corpus is used to recognize and tagg automatically some of sentences or words that will be used as content in the next corpus. The result will be edited manually and then bundled together with the first small corpus. Then this bundle will be use to recognize and tagg the content in the next corpus. So then, we will gain corpus with larger volume. In this research, corpus in Indonesian language consist of 10860 files will be developed with iterative approach.<p align="justify"><p>From analyses and measurements, the system can reach accuracy about 95.28. %. From this result, we can conclude that the developed corpus with iterative approach. can produce good accuracy and more efficient compared to manual labeling. <br />
|
---|