Speaker diarization of news broacasts and meeting recordings
Given a piece of audio recording, the task of speaker diarization can be summarized as answering the question of “Who spoke when ?”. This thesis offers a review of the techniques and issues relating to performing speaker diarization on broadcast news recordings, as well as meeting recordings. The br...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2009
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/15707 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-15707 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-157072023-03-04T00:46:23Z Speaker diarization of news broacasts and meeting recordings Koh, Eugene Chin Wei Chng Eng Siong Li Haizhou School of Computer Engineering Emerging Research Lab DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Given a piece of audio recording, the task of speaker diarization can be summarized as answering the question of “Who spoke when ?”. This thesis offers a review of the techniques and issues relating to performing speaker diarization on broadcast news recordings, as well as meeting recordings. The broadcast news domain is generally regarded to be simpler because the turn taking between speakers is better controlled and audio quality tends to be higher. The typical approach used for this domain consist of two steps - speaker segmentation and then speaker clustering. The Bayesian Information Criterion (BIC) has been a very popular distance measure for both speaker segmentation and clustering. Experiments were conducted that confirmed the effectiveness of this distance measure for segmentation and clustering. Further speaker segmentation experiments were performed using the Hotelling’s T2 statistic to augment the BIC. It was observed that while this does speed up processing, the segmentation FScore obtained does not match up to that reported in the literature. A novel speaker clustering approach was also introduced where polynomial expanded feature vectors were used to compute the distance between clusters. It was found that this approach could produce results comparable to that for the BIC. In order to address the problem of speaker diarization for the meeting domain, a diarization system was developed and submitted for the NIST Rich Transcription 2007 (RT-07) evaluation. This diarization system exploited the diversity of meeting recording channels by performing Time Delay of Arrival (TDOA) estimation using a Normalized Least Means Squared (NLMS) filter. Subsequent performance enhancements were delivered by adding a cluster purification module, as well as a Non-Speech & Silence Removal (NS&SR) module. An overall Diarization Error Rate (DER) of 15.32% was obtained for the RT-07 corpus. This score was found to be competitive against the other entrants in the evaluation exercise. MASTER OF ENGINEERING (SCE) 2009-05-14T02:52:07Z 2009-05-14T02:52:07Z 2009 2009 Thesis Koh, E. C. W. (2009). Speaker diarization of news broacasts and meeting recordings. Master’s thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/15707 10.32657/10356/15707 en 128 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Koh, Eugene Chin Wei Speaker diarization of news broacasts and meeting recordings |
description |
Given a piece of audio recording, the task of speaker diarization can be summarized as answering the question of “Who spoke when ?”. This thesis offers a review of the techniques and issues relating to performing speaker diarization on broadcast news recordings, as well as meeting recordings. The broadcast news domain is generally regarded to be simpler because the turn taking between speakers is better controlled and audio quality tends to be higher. The typical approach used for this domain consist of two steps - speaker segmentation and then speaker clustering. The Bayesian Information Criterion (BIC) has been a very popular distance measure for both speaker segmentation and clustering. Experiments were conducted that confirmed the effectiveness of this distance measure for segmentation and clustering. Further speaker segmentation experiments were performed using the Hotelling’s T2 statistic to augment the BIC. It was observed that while this does speed up processing, the segmentation FScore obtained does not match up to that reported in the literature. A novel speaker clustering approach was also introduced where polynomial expanded feature vectors were used to compute the distance between clusters. It was found that this approach could produce results comparable to that for the BIC. In order to address the problem of speaker diarization for the meeting domain, a diarization system was developed and submitted for the NIST Rich Transcription 2007 (RT-07) evaluation. This diarization system exploited the diversity of meeting recording channels by performing Time Delay of Arrival (TDOA) estimation using a Normalized Least Means Squared (NLMS) filter. Subsequent performance enhancements were delivered by adding a cluster purification module, as well as a Non-Speech & Silence Removal (NS&SR) module. An overall Diarization Error Rate (DER) of 15.32% was obtained for the RT-07 corpus. This score was found to be competitive against the other entrants in the evaluation exercise. |
author2 |
Chng Eng Siong |
author_facet |
Chng Eng Siong Koh, Eugene Chin Wei |
format |
Theses and Dissertations |
author |
Koh, Eugene Chin Wei |
author_sort |
Koh, Eugene Chin Wei |
title |
Speaker diarization of news broacasts and meeting recordings |
title_short |
Speaker diarization of news broacasts and meeting recordings |
title_full |
Speaker diarization of news broacasts and meeting recordings |
title_fullStr |
Speaker diarization of news broacasts and meeting recordings |
title_full_unstemmed |
Speaker diarization of news broacasts and meeting recordings |
title_sort |
speaker diarization of news broacasts and meeting recordings |
publishDate |
2009 |
url |
https://hdl.handle.net/10356/15707 |
_version_ |
1759854564344856576 |