Image processing techniques for speech signal processing

The purpose of this research is to examine the use of visual representation and image processing techniques for speech processing applications. This is inspired by the fact that human spectrogram readers can rely solely on visual cues in the spectrogram to perform recognition of words, even in the p...

Full description

Saved in:
Bibliographic Details
Main Author: Leow, Su Jun
Other Authors: School of Computer Science and Engineering
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/73231
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The purpose of this research is to examine the use of visual representation and image processing techniques for speech processing applications. This is inspired by the fact that human spectrogram readers can rely solely on visual cues in the spectrogram to perform recognition of words, even in the presence of differing channels and speakers. This suggests that features on the image spectrogram carry important information that can be harnessed for speech processing applications. It is postulated that the image representation better embeds contextual information that is required for human understanding of the speech context. Unfortunately, commonly used speech features, such as the Mel Frequency Cepstral Coefficients(MFCC), are largely frame-based. Therefore, the image representation of speech can serve to complement existing speech features to improve performance of existing speech tasks. In this work, we developed and applied a solely visual approach to solve speech problems. Two concrete examples of its application are given, so as to perform unsupervised speech segmentation and the detection of unit selection based synthesized speech for anti- spoofing. We first provide the necessary background by introducing common speech and image processing problems, and then draw the parallel of speech and image processing problems. Next, we introduce an image representation of speech to enable the application of image processing techniques. We then conclude the background with a survey of past attempts that uses image processing techniques on speech and acoustic processing tasks. Next, in the experiments section, we give an in-depth discussion of solving the two example speech tasks using a solely image-based solution. Finally, we wrap up the thesis with conclusions and future work.