Optimizing model training for speech recognition

Modern speech recognition systems are generally based on statistical models which output a sequence of symbols or quantities. These models can be trained automatically and are simple and computationally feasible to use. To reduce long computational time, the model training can be distributed to many...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Chak, Hui Ping
مؤلفون آخرون:	Lee Bu Sung
التنسيق:	Final Year Project
اللغة:	English
منشور في:	2010
الموضوعات:	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
الوصول للمادة أونلاين:	http://hdl.handle.net/10356/40059
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

الوصف
الملخص:	Modern speech recognition systems are generally based on statistical models which output a sequence of symbols or quantities. These models can be trained automatically and are simple and computationally feasible to use. To reduce long computational time, the model training can be distributed to many machines for parallel processing. Apache Hadoop is a Java software framework that uses the Map-Reduce architecture to support data-intensive parallel and distributed processing. The objective of this project is to tune the performance of model training for speech recognition by distributing and parallelizing the model training process using the Hadoop framework. Performance of the optimization is measured for comparison and analysis. The report also shows how the legacy scripts are ported into the Map-Reduce architecture and discusses the issues and challenges involved. With the aid of the Swimlanes visualization tools [1] in understanding and tuning the performance of the job, various methods of processing data for the training are explored and discussed in the report. Performance is measured for 100 iterations of the model training process for 4 nodes using the various methods discussed. From the results of the experiment, it is found that model training can be optimized by taking data locality into consideration in the software design.

Optimizing model training for speech recognition

مواد مشابهة