Analysis and application of speech adaptation on avatar system

Schizophrenia is a type of chronic and severe mental disorder, which is affecting an increasing number of people all over the world. The clinical diagnosis and assessment of mentally ill patients are subjective, leading to a significant need of training new psychiatrists in a more objective way. Hen...

Full description

Saved in:
Bibliographic Details
Main Author: Luo, Fei
Other Authors: Justin Dauwels
Format: Theses and Dissertations
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10356/78845
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Schizophrenia is a type of chronic and severe mental disorder, which is affecting an increasing number of people all over the world. The clinical diagnosis and assessment of mentally ill patients are subjective, leading to a significant need of training new psychiatrists in a more objective way. Hence, we are aiming to create a virtual robot with schizophrenic symptoms to provide a more objective overview of schizophrenic patients, which can further be used to coach psychiatrists on how to have more productive interactions with patients with schizophrenia. The speech, movement, facial expression, posture and memory of current virtual robot need to be improved. In this dissertation, I focused on analyzing speech adaptation features from the recordings of the clinical interviews and then built the pipeline to implement speech adaptation on avatar platform. We have audio recordings of 75 interviews where 50 of them are between psychiatrists and schizophrenic patients and 25 of them are between psychiatrists and healthy individuals. Next, three low-level speech features, namely pitch, speech rate, and loudness, are extracted from both participant channel and psychiatrist channel. Then, I utilized Granger causality test (GCT) to test whether participants' speech is influenced by psychiatrists' voice and also applied Gaussian Mixture Model (GMM) to generate the distribution of pitch, speech rate and loudness of schizophrenic patients and healthy individuals respectively. Then, I built a schizophrenic model and a healthy model to change the pitch, speech rate and loudness settings of the text-to-speech engine on the virtual human platform. After the implementation, the virtual human is able to dynamically adapt her speech in pitch, speech rate and loudness based on the previous conversation. In addition, multilayer perceptron (MLP) neural network is discussed in this dissertation, which provides an idea to solve this kind of Input-Output fitting problem with a neural network.