Speech recognition and synthesis

The recent advances in text-to-speech have been awe-inspiring, to the point of synthesizing near-human speeches. To achieve this, deep-neural networks are trained using different sound clips of a single speaker. However, traditional text-to-speech systems require a whole new dataset to produce the v...

Full description

Saved in:
Bibliographic Details
Main Author: Kang, Yi Da
Other Authors: Tan Yap Peng
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/167681
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-167681
record_format dspace
spelling sg-ntu-dr.10356-1676812023-11-29T07:38:12Z Speech recognition and synthesis Kang, Yi Da Tan Yap Peng School of Electrical and Electronic Engineering EYPTan@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Electrical and electronic engineering The recent advances in text-to-speech have been awe-inspiring, to the point of synthesizing near-human speeches. To achieve this, deep-neural networks are trained using different sound clips of a single speaker. However, traditional text-to-speech systems require a whole new dataset to produce the voice of a new speaker and retrain the model. Using a recently developed three-stage system, trained models can clone speakers' voices unseen during training. With the use of an encoder, the critical features of speakers are encapsulated from a short clip. Researchers have previously developed models with such capabilities. This project intends to build on that with newer synthesiser implementations, and vocoders to reduce training time and improve naturalness. This paper delves into two such methods and analyses different models that can be used in such a system. Bachelor of Engineering (Electrical and Electronic Engineering) 2023-05-30T04:55:01Z 2023-05-30T04:55:01Z 2023 Final Year Project (FYP) Kang, Y. D. (2023). Speech recognition and synthesis. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/167681 https://hdl.handle.net/10356/167681 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Electrical and electronic engineering
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Electrical and electronic engineering
Kang, Yi Da
Speech recognition and synthesis
description The recent advances in text-to-speech have been awe-inspiring, to the point of synthesizing near-human speeches. To achieve this, deep-neural networks are trained using different sound clips of a single speaker. However, traditional text-to-speech systems require a whole new dataset to produce the voice of a new speaker and retrain the model. Using a recently developed three-stage system, trained models can clone speakers' voices unseen during training. With the use of an encoder, the critical features of speakers are encapsulated from a short clip. Researchers have previously developed models with such capabilities. This project intends to build on that with newer synthesiser implementations, and vocoders to reduce training time and improve naturalness. This paper delves into two such methods and analyses different models that can be used in such a system.
author2 Tan Yap Peng
author_facet Tan Yap Peng
Kang, Yi Da
format Final Year Project
author Kang, Yi Da
author_sort Kang, Yi Da
title Speech recognition and synthesis
title_short Speech recognition and synthesis
title_full Speech recognition and synthesis
title_fullStr Speech recognition and synthesis
title_full_unstemmed Speech recognition and synthesis
title_sort speech recognition and synthesis
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/167681
_version_ 1783955485996613632