Audio pattern discovery and retrieval
This thesis explores unsupervised algorithms for pattern discovery and retrieval in audio and speech data. In this work, audio pattern is defined as repeating audio content such as repeating music segments or words/short phrases in speech recordings. The meanings of “pattern” will be defined separat...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/51781 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-51781 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-517812023-03-04T00:41:33Z Audio pattern discovery and retrieval Wang, Lei Chng Eng Siong Li Haizhou School of Computer Engineering Emerging Research Lab DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition This thesis explores unsupervised algorithms for pattern discovery and retrieval in audio and speech data. In this work, audio pattern is defined as repeating audio content such as repeating music segments or words/short phrases in speech recordings. The meanings of “pattern” will be defined separately for different types of data, for example, repeating pattern discovery in music will extract segments with similar melody in music piece; In human speech, the same words/short phrases spoken by single or multiple speakers are also defined as speech patterns; In broadcast audio, repeated commercials/logo music are also considered as patterns. Previous work on audio pattern discovery focuses on either symbolizing the audio signal into token sequences followed by text-based search or using Brute-Force search techniques such as self-similarity matrix and Dynamic Time Warping. Symbolization process that relies on Vector Quantization or other modelling techniques may suffer from misclassification errors, and the exhaustive search requires high computation cost and can also be affected by channel distortion and speaker variation in audio data. Such limitations motivate me to explore more efficient and robust approaches to automatically detect repeating information in audio data. In this thesis, different unsupervised techniques are examined to analyze music and speech separately. For music, an efficient approach which extends Ukkonon's suffix tree construction algorithm is proposed to detect repeating segments. For speech data, an iterative merging approach which is based on Acoustic Segment Model (ASM) is proposed to discover recurrent phrases/words in speech. This thesis also explores the techniques of searching audio pattern in broadcast audio which consists of diverse content such as speech, music/songs, commercials, sound effects and background noise. Existing audio pattern retrieval techniques focus only on specific audio types so that their applications are limited and cannot be applied generally. In this work, a robust query-by-example framework is proposed for retrieving mixed speech and music pattern, where the ASM is examined to model music data. To verify the research, the proposed techniques are applied on both public domain audio database such as TIDIGITS corpus as well as TRECVID database and a self-collection of 30 English pop songs. The experimental results show that the proposed work achieves robust and better performance to existing techniques. DOCTOR OF PHILOSOPHY (SCE) 2013-04-11T04:33:30Z 2013-04-11T04:33:30Z 2012 2012 Thesis Wang, L. (2012). Audio pattern discovery and retrieval. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/51781 10.32657/10356/51781 en 137 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Wang, Lei Audio pattern discovery and retrieval |
description |
This thesis explores unsupervised algorithms for pattern discovery and retrieval in audio and speech data. In this work, audio pattern is defined as repeating audio content such as repeating music segments or words/short phrases in speech recordings. The meanings of “pattern” will be defined separately for different types of data, for example, repeating pattern discovery in music will extract segments with similar melody in music piece; In human speech, the same words/short phrases spoken by single or multiple speakers are also defined as speech patterns; In broadcast audio, repeated commercials/logo music are also considered as patterns.
Previous work on audio pattern discovery focuses on either symbolizing the audio signal into token sequences followed by text-based search or using Brute-Force search techniques such as self-similarity matrix and Dynamic Time Warping. Symbolization process that relies on Vector Quantization or other modelling techniques may suffer from misclassification errors, and the exhaustive search requires high computation cost and can also be affected by channel distortion and speaker variation in audio data. Such limitations motivate me to explore more efficient and robust approaches to automatically detect repeating information in audio data.
In this thesis, different unsupervised techniques are examined to analyze music and speech separately. For music, an efficient approach which extends Ukkonon's suffix tree construction algorithm is proposed to detect repeating segments. For speech data, an iterative merging approach which is based on Acoustic Segment Model (ASM) is proposed to discover recurrent phrases/words in speech.
This thesis also explores the techniques of searching audio pattern in broadcast audio which consists of diverse content such as speech, music/songs, commercials, sound effects and background noise. Existing audio pattern retrieval techniques focus only on specific audio types so that their applications are limited and cannot be applied generally. In this work, a robust query-by-example framework is proposed for retrieving mixed speech and music pattern, where the ASM is examined to model music data.
To verify the research, the proposed techniques are applied on both public domain audio database such as TIDIGITS corpus as well as TRECVID database and a self-collection of 30 English pop songs. The experimental results show that the proposed work achieves robust and better performance to existing techniques. |
author2 |
Chng Eng Siong |
author_facet |
Chng Eng Siong Wang, Lei |
format |
Theses and Dissertations |
author |
Wang, Lei |
author_sort |
Wang, Lei |
title |
Audio pattern discovery and retrieval |
title_short |
Audio pattern discovery and retrieval |
title_full |
Audio pattern discovery and retrieval |
title_fullStr |
Audio pattern discovery and retrieval |
title_full_unstemmed |
Audio pattern discovery and retrieval |
title_sort |
audio pattern discovery and retrieval |
publishDate |
2013 |
url |
https://hdl.handle.net/10356/51781 |
_version_ |
1759854191393636352 |