Image processing techniques for speech signal processing

The purpose of this research is to examine the use of visual representation and image processing techniques for speech processing applications. This is inspired by the fact that human spectrogram readers can rely solely on visual cues in the spectrogram to perform recognition of words, even in the p...

Full description

Saved in:
Bibliographic Details
Main Author: Leow, Su Jun
Other Authors: School of Computer Science and Engineering
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/73231
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-73231
record_format dspace
spelling sg-ntu-dr.10356-732312023-03-04T00:47:47Z Image processing techniques for speech signal processing Leow, Su Jun School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision The purpose of this research is to examine the use of visual representation and image processing techniques for speech processing applications. This is inspired by the fact that human spectrogram readers can rely solely on visual cues in the spectrogram to perform recognition of words, even in the presence of differing channels and speakers. This suggests that features on the image spectrogram carry important information that can be harnessed for speech processing applications. It is postulated that the image representation better embeds contextual information that is required for human understanding of the speech context. Unfortunately, commonly used speech features, such as the Mel Frequency Cepstral Coefficients(MFCC), are largely frame-based. Therefore, the image representation of speech can serve to complement existing speech features to improve performance of existing speech tasks. In this work, we developed and applied a solely visual approach to solve speech problems. Two concrete examples of its application are given, so as to perform unsupervised speech segmentation and the detection of unit selection based synthesized speech for anti- spoofing. We first provide the necessary background by introducing common speech and image processing problems, and then draw the parallel of speech and image processing problems. Next, we introduce an image representation of speech to enable the application of image processing techniques. We then conclude the background with a survey of past attempts that uses image processing techniques on speech and acoustic processing tasks. Next, in the experiments section, we give an in-depth discussion of solving the two example speech tasks using a solely image-based solution. Finally, we wrap up the thesis with conclusions and future work. ​Master of Engineering (SCE) 2018-01-29T01:31:30Z 2018-01-29T01:31:30Z 2018 Thesis Leow, S. J. (2018). Image processing techniques for speech signal processing. Master's thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/73231 10.32657/10356/73231 en 100 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Leow, Su Jun
Image processing techniques for speech signal processing
description The purpose of this research is to examine the use of visual representation and image processing techniques for speech processing applications. This is inspired by the fact that human spectrogram readers can rely solely on visual cues in the spectrogram to perform recognition of words, even in the presence of differing channels and speakers. This suggests that features on the image spectrogram carry important information that can be harnessed for speech processing applications. It is postulated that the image representation better embeds contextual information that is required for human understanding of the speech context. Unfortunately, commonly used speech features, such as the Mel Frequency Cepstral Coefficients(MFCC), are largely frame-based. Therefore, the image representation of speech can serve to complement existing speech features to improve performance of existing speech tasks. In this work, we developed and applied a solely visual approach to solve speech problems. Two concrete examples of its application are given, so as to perform unsupervised speech segmentation and the detection of unit selection based synthesized speech for anti- spoofing. We first provide the necessary background by introducing common speech and image processing problems, and then draw the parallel of speech and image processing problems. Next, we introduce an image representation of speech to enable the application of image processing techniques. We then conclude the background with a survey of past attempts that uses image processing techniques on speech and acoustic processing tasks. Next, in the experiments section, we give an in-depth discussion of solving the two example speech tasks using a solely image-based solution. Finally, we wrap up the thesis with conclusions and future work.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Leow, Su Jun
format Theses and Dissertations
author Leow, Su Jun
author_sort Leow, Su Jun
title Image processing techniques for speech signal processing
title_short Image processing techniques for speech signal processing
title_full Image processing techniques for speech signal processing
title_fullStr Image processing techniques for speech signal processing
title_full_unstemmed Image processing techniques for speech signal processing
title_sort image processing techniques for speech signal processing
publishDate 2018
url http://hdl.handle.net/10356/73231
_version_ 1759857548431720448