Reconstruction of natural sounding speech from whispers
This thesis explores reconstruction of natural sounding speech from whispers. As a broad research class, the generation of normally phonated speech from whispers can be useful in several types of application from different scientific fields ranging from communications to biomedical engineering. The...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2011
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/46426 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-46426 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-464262023-03-04T00:44:58Z Reconstruction of natural sounding speech from whispers Sharifzadeh, Hamid Reza Ian Vince McLoughlin School of Computer Engineering Parallel and Distributed Computing Centre DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing This thesis explores reconstruction of natural sounding speech from whispers. As a broad research class, the generation of normally phonated speech from whispers can be useful in several types of application from different scientific fields ranging from communications to biomedical engineering. The primary focus of the thesis and current work is therefore to investigate appropriate solutions and algorithms for regenerating natural phonated speech from whispers. Interestingly, unlike other speech processing fields, many aspects of such reconstruction, in spite of the useful applications, have not yet been resolved by researchers. In particular, the outcome of this research will find at least two immediate applications which have different forms but similar solutions: a) reconstructing natural speech for laryngectomy patients, b) restoring natural pitched speech in a cell phone/telephone communication when one party talks in a whispering mode for privacy or security reasons. This thesis presents a solution for the conversion of whispers to fully-phonated speech through the modification of the CELP codec. We also present a novel method for spectral enhancement and formant smoothing during the reconstruction process, using a probability mass-density function to identify reliable formant trajectories in whispers, and apply spectral modifications accordingly. The method relies upon the observation that, whilst the pitch generation mechanism of patients with larynx damage is typically unusable, the remaining components of the speech production apparatus may be largely unaffected. The approach outlined here allows patients to regain their ability to speak (simply by whispering into an external prosthesis), yielding a more natural sounding voice than alternative solutions. Since whispered speech can be identified as the core input of the system, the acoustic features of whispers also need to be considered. Despite the everyday nature of whispering, and its undoubted usefulness in vocal communications, whispers have received relatively little research effort to date, apart from some studies analysing the main whispered vowels and some quite general estimations of whispered speech characteristics. In particular, a classic vowel space determination has been lacking for whispers. For voiced speech, this type of information has played an important role in the development and testing of recognition and processing theories over the past few decades, and can be expected to be equally useful for whisper-mode communications and recognition systems. This thesis also aims to redress the shortfall by presenting a vowel formant space for whispered speech, and comparing the results with corresponding phonated samples. DOCTOR OF PHILOSOPHY (SCE) 2011-12-06T01:49:48Z 2011-12-06T01:49:48Z 2011 2011 Thesis Sharifzadeh, H. R. (2011). Reconstruction of natural sounding speech from whispers. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/46426 10.32657/10356/46426 en 151 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing |
spellingShingle |
DRNTU::Engineering::Electrical and electronic engineering::Electronic systems::Signal processing Sharifzadeh, Hamid Reza Reconstruction of natural sounding speech from whispers |
description |
This thesis explores reconstruction of natural sounding speech from whispers. As a broad research class, the generation of normally phonated speech from whispers can be useful in several types of application from different scientific fields ranging from communications to biomedical engineering. The primary focus of the thesis and current work is therefore to investigate appropriate solutions and algorithms for regenerating natural phonated speech from whispers. Interestingly, unlike other speech processing fields, many aspects of such reconstruction, in spite of the useful applications, have not yet been resolved by researchers. In particular, the outcome of this research will find at least two immediate applications which have different forms but similar solutions: a) reconstructing natural speech for laryngectomy patients, b) restoring natural pitched speech in a cell phone/telephone communication when one party talks in a whispering mode for privacy or security reasons. This thesis presents a solution for the conversion of whispers to fully-phonated speech through the modification of the CELP codec. We also present a novel method for spectral enhancement and formant smoothing during the reconstruction process, using a probability mass-density function to identify reliable formant trajectories in whispers, and apply spectral modifications accordingly. The method relies upon the observation that, whilst the pitch generation mechanism of patients with larynx damage is typically unusable, the remaining components of the speech production apparatus may be largely unaffected. The approach outlined here allows patients to regain their ability to speak (simply by whispering into an external prosthesis), yielding a more natural sounding voice than alternative solutions. Since whispered speech can be identified as the core input of the system, the acoustic features of whispers also need to be considered. Despite the everyday nature of whispering, and its undoubted usefulness in vocal communications, whispers have received relatively little research effort to date, apart from some studies analysing the main whispered vowels and some quite general estimations of whispered speech characteristics. In particular, a classic vowel space determination has been lacking for whispers. For voiced speech, this type of information has played an important role in the development and testing of recognition and processing theories over the past few decades, and can be expected to be equally useful for whisper-mode communications and recognition systems. This thesis also aims to redress the shortfall by presenting a vowel formant space for whispered speech, and comparing the results with corresponding phonated samples. |
author2 |
Ian Vince McLoughlin |
author_facet |
Ian Vince McLoughlin Sharifzadeh, Hamid Reza |
format |
Theses and Dissertations |
author |
Sharifzadeh, Hamid Reza |
author_sort |
Sharifzadeh, Hamid Reza |
title |
Reconstruction of natural sounding speech from whispers |
title_short |
Reconstruction of natural sounding speech from whispers |
title_full |
Reconstruction of natural sounding speech from whispers |
title_fullStr |
Reconstruction of natural sounding speech from whispers |
title_full_unstemmed |
Reconstruction of natural sounding speech from whispers |
title_sort |
reconstruction of natural sounding speech from whispers |
publishDate |
2011 |
url |
https://hdl.handle.net/10356/46426 |
_version_ |
1759855823362719744 |