Development of a Mandarin learning tool for children using speech recognition model

This project report explores the evaluation performance of speech recognition and generation models specifically for short Mandarin phrases and children's voices. It introduces a Mandarin learning application prototype framework that leverages these models, which have been finetuned to recog...

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Yilin
Other Authors: Tan Yap Peng
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/177146
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This project report explores the evaluation performance of speech recognition and generation models specifically for short Mandarin phrases and children's voices. It introduces a Mandarin learning application prototype framework that leverages these models, which have been finetuned to recognize nuances in children’s voice and short Chinese phrases. The primary goal of this study was to forge a developmental pathway for a learning tool designed to significantly enhance the educational experience of children. Presenting a tool framework focuses on improving pronunciation, intonation, and understanding of Chinese characters (汉字) through a structured pedagogical approach. This project is the extensive adaptation of the Whisper Model, engineered to overcome the inherent variability in children's speech patterns and the tonal complexity of Mandarin. Our approach involved a systematic methodology comprising the assembly of a children audio dataset, model performance testing with a focus on children's voices, and fine-tuning to elevate the model's acuity for concise Mandarin phrases. The prototype framework serves as a proof of concept, demonstrating the capabilities of the model in a structured educational context. It outlines the envisioned interactive modules aimed at reinforcing pronunciation, intonation, and character recognition, fostering a comprehensive learning experience. The project successfully demonstrated the Whisper model's performance at recognising short phrases articulated by both adults and children. This success underpins the model's enhancements to better serve the unique needs of young learners and short phrase recognition, culminating in the introduction of an educational application prototype framework. This prototype harnesses speech technology to facilitate language learning, thereby showcasing the potential of integrating speech recognition and generation technologies into educational tools. The findings lay a crucial groundwork for future research and development in this field.