Generative models for speech emotion synthesis

Several attempts have been made to synthesize speech from text. However, existing methods tend to generate speech that sound artificial and lack emotional content. In this project, we investigate using Generative Adversarial Networks (GANs) to generate emotional speech. WaveGAN (2019) was a fir...

Full description

Saved in:
Bibliographic Details
Main Author: Raj, Nathanael S.
Other Authors: Jagath C. Rajapakse
Format: Final Year Project
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10356/76865
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-76865
record_format dspace
spelling sg-ntu-dr.10356-768652023-03-03T20:46:06Z Generative models for speech emotion synthesis Raj, Nathanael S. Jagath C. Rajapakse School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Several attempts have been made to synthesize speech from text. However, existing methods tend to generate speech that sound artificial and lack emotional content. In this project, we investigate using Generative Adversarial Networks (GANs) to generate emotional speech. WaveGAN (2019) was a first attempt at generating speech using raw audio waveforms. It produced natural sounding audio, including speech, bird chirpings and drums. In this project, we applied WaveGAN to emotional speech data from The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), using all 8 categories of emotion. We performed modifications on WaveGAN using advanced conditioning strategies, namely Sparse Vector Conditioning and introducing Auxiliary Classifiers. In experiments conducted with human listeners, we found that these methods greatly aided subjects in identifying the generated emotions correctly, and improved ease of intelligibility and quality of generated samples. Bachelor of Engineering (Computer Science) 2019-04-20T06:12:15Z 2019-04-20T06:12:15Z 2019 Final Year Project (FYP) http://hdl.handle.net/10356/76865 en Nanyang Technological University 56 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Raj, Nathanael S.
Generative models for speech emotion synthesis
description Several attempts have been made to synthesize speech from text. However, existing methods tend to generate speech that sound artificial and lack emotional content. In this project, we investigate using Generative Adversarial Networks (GANs) to generate emotional speech. WaveGAN (2019) was a first attempt at generating speech using raw audio waveforms. It produced natural sounding audio, including speech, bird chirpings and drums. In this project, we applied WaveGAN to emotional speech data from The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), using all 8 categories of emotion. We performed modifications on WaveGAN using advanced conditioning strategies, namely Sparse Vector Conditioning and introducing Auxiliary Classifiers. In experiments conducted with human listeners, we found that these methods greatly aided subjects in identifying the generated emotions correctly, and improved ease of intelligibility and quality of generated samples.
author2 Jagath C. Rajapakse
author_facet Jagath C. Rajapakse
Raj, Nathanael S.
format Final Year Project
author Raj, Nathanael S.
author_sort Raj, Nathanael S.
title Generative models for speech emotion synthesis
title_short Generative models for speech emotion synthesis
title_full Generative models for speech emotion synthesis
title_fullStr Generative models for speech emotion synthesis
title_full_unstemmed Generative models for speech emotion synthesis
title_sort generative models for speech emotion synthesis
publishDate 2019
url http://hdl.handle.net/10356/76865
_version_ 1759855406200389632