Speech recognition and synthesis
The recent advances in text-to-speech have been awe-inspiring, to the point of synthesizing near-human speeches. To achieve this, deep-neural networks are trained using different sound clips of a single speaker. However, traditional text-to-speech systems require a whole new dataset to produce the v...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/167681 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-167681 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1676812023-11-29T07:38:12Z Speech recognition and synthesis Kang, Yi Da Tan Yap Peng School of Electrical and Electronic Engineering EYPTan@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Electrical and electronic engineering The recent advances in text-to-speech have been awe-inspiring, to the point of synthesizing near-human speeches. To achieve this, deep-neural networks are trained using different sound clips of a single speaker. However, traditional text-to-speech systems require a whole new dataset to produce the voice of a new speaker and retrain the model. Using a recently developed three-stage system, trained models can clone speakers' voices unseen during training. With the use of an encoder, the critical features of speakers are encapsulated from a short clip. Researchers have previously developed models with such capabilities. This project intends to build on that with newer synthesiser implementations, and vocoders to reduce training time and improve naturalness. This paper delves into two such methods and analyses different models that can be used in such a system. Bachelor of Engineering (Electrical and Electronic Engineering) 2023-05-30T04:55:01Z 2023-05-30T04:55:01Z 2023 Final Year Project (FYP) Kang, Y. D. (2023). Speech recognition and synthesis. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/167681 https://hdl.handle.net/10356/167681 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Electrical and electronic engineering |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Electrical and electronic engineering Kang, Yi Da Speech recognition and synthesis |
description |
The recent advances in text-to-speech have been awe-inspiring, to the point of synthesizing near-human speeches. To achieve this, deep-neural networks are trained using different sound clips of a single speaker. However, traditional text-to-speech systems require a whole new dataset to produce the voice of a new speaker and retrain the model.
Using a recently developed three-stage system, trained models can clone speakers' voices unseen during training. With the use of an encoder, the critical features of speakers are encapsulated from a short clip. Researchers have previously developed models with such capabilities. This project intends to build on that with newer synthesiser implementations, and vocoders to reduce training time and improve naturalness.
This paper delves into two such methods and analyses different models that can be used in such a system. |
author2 |
Tan Yap Peng |
author_facet |
Tan Yap Peng Kang, Yi Da |
format |
Final Year Project |
author |
Kang, Yi Da |
author_sort |
Kang, Yi Da |
title |
Speech recognition and synthesis |
title_short |
Speech recognition and synthesis |
title_full |
Speech recognition and synthesis |
title_fullStr |
Speech recognition and synthesis |
title_full_unstemmed |
Speech recognition and synthesis |
title_sort |
speech recognition and synthesis |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/167681 |
_version_ |
1783955485996613632 |