Image processing techniques for speech signal processing
The purpose of this research is to examine the use of visual representation and image processing techniques for speech processing applications. This is inspired by the fact that human spectrogram readers can rely solely on visual cues in the spectrogram to perform recognition of words, even in the p...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/73231 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-73231 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-732312023-03-04T00:47:47Z Image processing techniques for speech signal processing Leow, Su Jun School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision The purpose of this research is to examine the use of visual representation and image processing techniques for speech processing applications. This is inspired by the fact that human spectrogram readers can rely solely on visual cues in the spectrogram to perform recognition of words, even in the presence of differing channels and speakers. This suggests that features on the image spectrogram carry important information that can be harnessed for speech processing applications. It is postulated that the image representation better embeds contextual information that is required for human understanding of the speech context. Unfortunately, commonly used speech features, such as the Mel Frequency Cepstral Coefficients(MFCC), are largely frame-based. Therefore, the image representation of speech can serve to complement existing speech features to improve performance of existing speech tasks. In this work, we developed and applied a solely visual approach to solve speech problems. Two concrete examples of its application are given, so as to perform unsupervised speech segmentation and the detection of unit selection based synthesized speech for anti- spoofing. We first provide the necessary background by introducing common speech and image processing problems, and then draw the parallel of speech and image processing problems. Next, we introduce an image representation of speech to enable the application of image processing techniques. We then conclude the background with a survey of past attempts that uses image processing techniques on speech and acoustic processing tasks. Next, in the experiments section, we give an in-depth discussion of solving the two example speech tasks using a solely image-based solution. Finally, we wrap up the thesis with conclusions and future work. Master of Engineering (SCE) 2018-01-29T01:31:30Z 2018-01-29T01:31:30Z 2018 Thesis Leow, S. J. (2018). Image processing techniques for speech signal processing. Master's thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/73231 10.32657/10356/73231 en 100 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Leow, Su Jun Image processing techniques for speech signal processing |
description |
The purpose of this research is to examine the use of visual representation and image processing techniques for speech processing applications. This is inspired by the fact that human spectrogram readers can rely solely on visual cues in the spectrogram to perform recognition of words, even in the presence of differing channels and speakers. This suggests that features on the image spectrogram carry important information that can be harnessed for speech processing applications. It is postulated that the image representation better embeds contextual information that is required for human understanding of the speech context. Unfortunately, commonly used speech features, such as the Mel Frequency Cepstral Coefficients(MFCC), are largely frame-based. Therefore, the image representation of speech can serve to complement existing speech features to improve performance of existing speech tasks. In this work, we developed and applied a solely visual approach to solve speech problems. Two concrete examples of its application are given, so as to perform unsupervised speech segmentation and the detection of unit selection based synthesized speech for anti- spoofing. We first provide the necessary background by introducing common speech and image processing problems, and then draw the parallel of speech and image processing problems. Next, we introduce an image representation of speech to enable the application of image processing techniques. We then conclude the background with a survey of past attempts that uses image processing techniques on speech and acoustic processing tasks. Next, in the experiments section, we give an in-depth discussion of solving the two example speech tasks using a solely image-based solution. Finally, we wrap up the thesis with conclusions and future work. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Leow, Su Jun |
format |
Theses and Dissertations |
author |
Leow, Su Jun |
author_sort |
Leow, Su Jun |
title |
Image processing techniques for speech signal processing |
title_short |
Image processing techniques for speech signal processing |
title_full |
Image processing techniques for speech signal processing |
title_fullStr |
Image processing techniques for speech signal processing |
title_full_unstemmed |
Image processing techniques for speech signal processing |
title_sort |
image processing techniques for speech signal processing |
publishDate |
2018 |
url |
http://hdl.handle.net/10356/73231 |
_version_ |
1759857548431720448 |