Multi-language Southeast Asian speech2text development

This project aims to improve on the current Code-Switch ASR technology for English- (Dravidian Language) further and help in the future development of an engine that allows speech to be detected in English and another Dravidian language. As a native speaker of English and Tamil, the two langua...

Full description

Saved in:
Bibliographic Details
Main Author: Priya Kanakarajan
Other Authors: Jiang Xudong
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/158284
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This project aims to improve on the current Code-Switch ASR technology for English- (Dravidian Language) further and help in the future development of an engine that allows speech to be detected in English and another Dravidian language. As a native speaker of English and Tamil, the two languages chosen for my project are English and Tamil. ASRs consist of the Acoustic Model, Language Model and Pronunciation Lexicon. This report will investigate how we can train better Language Models to improve the outputs. To solve this, we also aim to expand the text corpus by not only collecting data but also generating data for the text corpus