Analysis and application of speech adaptation on avatar system

Schizophrenia is a type of chronic and severe mental disorder, which is affecting an increasing number of people all over the world. The clinical diagnosis and assessment of mentally ill patients are subjective, leading to a significant need of training new psychiatrists in a more objective way. Hen...

Full description

Saved in:
Bibliographic Details
Main Author: Luo, Fei
Other Authors: Justin Dauwels
Format: Theses and Dissertations
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10356/78845
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-78845
record_format dspace
spelling sg-ntu-dr.10356-788452023-07-04T16:16:04Z Analysis and application of speech adaptation on avatar system Luo, Fei Justin Dauwels School of Electrical and Electronic Engineering Engineering::Electrical and electronic engineering Schizophrenia is a type of chronic and severe mental disorder, which is affecting an increasing number of people all over the world. The clinical diagnosis and assessment of mentally ill patients are subjective, leading to a significant need of training new psychiatrists in a more objective way. Hence, we are aiming to create a virtual robot with schizophrenic symptoms to provide a more objective overview of schizophrenic patients, which can further be used to coach psychiatrists on how to have more productive interactions with patients with schizophrenia. The speech, movement, facial expression, posture and memory of current virtual robot need to be improved. In this dissertation, I focused on analyzing speech adaptation features from the recordings of the clinical interviews and then built the pipeline to implement speech adaptation on avatar platform. We have audio recordings of 75 interviews where 50 of them are between psychiatrists and schizophrenic patients and 25 of them are between psychiatrists and healthy individuals. Next, three low-level speech features, namely pitch, speech rate, and loudness, are extracted from both participant channel and psychiatrist channel. Then, I utilized Granger causality test (GCT) to test whether participants' speech is influenced by psychiatrists' voice and also applied Gaussian Mixture Model (GMM) to generate the distribution of pitch, speech rate and loudness of schizophrenic patients and healthy individuals respectively. Then, I built a schizophrenic model and a healthy model to change the pitch, speech rate and loudness settings of the text-to-speech engine on the virtual human platform. After the implementation, the virtual human is able to dynamically adapt her speech in pitch, speech rate and loudness based on the previous conversation. In addition, multilayer perceptron (MLP) neural network is discussed in this dissertation, which provides an idea to solve this kind of Input-Output fitting problem with a neural network. Master of Science (Computer Control and Automation) 2019-07-23T00:56:50Z 2019-07-23T00:56:50Z 2019 Thesis http://hdl.handle.net/10356/78845 en 74 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
spellingShingle Engineering::Electrical and electronic engineering
Luo, Fei
Analysis and application of speech adaptation on avatar system
description Schizophrenia is a type of chronic and severe mental disorder, which is affecting an increasing number of people all over the world. The clinical diagnosis and assessment of mentally ill patients are subjective, leading to a significant need of training new psychiatrists in a more objective way. Hence, we are aiming to create a virtual robot with schizophrenic symptoms to provide a more objective overview of schizophrenic patients, which can further be used to coach psychiatrists on how to have more productive interactions with patients with schizophrenia. The speech, movement, facial expression, posture and memory of current virtual robot need to be improved. In this dissertation, I focused on analyzing speech adaptation features from the recordings of the clinical interviews and then built the pipeline to implement speech adaptation on avatar platform. We have audio recordings of 75 interviews where 50 of them are between psychiatrists and schizophrenic patients and 25 of them are between psychiatrists and healthy individuals. Next, three low-level speech features, namely pitch, speech rate, and loudness, are extracted from both participant channel and psychiatrist channel. Then, I utilized Granger causality test (GCT) to test whether participants' speech is influenced by psychiatrists' voice and also applied Gaussian Mixture Model (GMM) to generate the distribution of pitch, speech rate and loudness of schizophrenic patients and healthy individuals respectively. Then, I built a schizophrenic model and a healthy model to change the pitch, speech rate and loudness settings of the text-to-speech engine on the virtual human platform. After the implementation, the virtual human is able to dynamically adapt her speech in pitch, speech rate and loudness based on the previous conversation. In addition, multilayer perceptron (MLP) neural network is discussed in this dissertation, which provides an idea to solve this kind of Input-Output fitting problem with a neural network.
author2 Justin Dauwels
author_facet Justin Dauwels
Luo, Fei
format Theses and Dissertations
author Luo, Fei
author_sort Luo, Fei
title Analysis and application of speech adaptation on avatar system
title_short Analysis and application of speech adaptation on avatar system
title_full Analysis and application of speech adaptation on avatar system
title_fullStr Analysis and application of speech adaptation on avatar system
title_full_unstemmed Analysis and application of speech adaptation on avatar system
title_sort analysis and application of speech adaptation on avatar system
publishDate 2019
url http://hdl.handle.net/10356/78845
_version_ 1772826796533743616