Speaker diarization of news broacasts and meeting recordings

Given a piece of audio recording, the task of speaker diarization can be summarized as answering the question of “Who spoke when ?”. This thesis offers a review of the techniques and issues relating to performing speaker diarization on broadcast news recordings, as well as meeting recordings. The br...

Full description

Saved in:
Bibliographic Details
Main Author: Koh, Eugene Chin Wei
Other Authors: Chng Eng Siong
Format: Theses and Dissertations
Language:English
Published: 2009
Subjects:
Online Access:https://hdl.handle.net/10356/15707
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-15707
record_format dspace
spelling sg-ntu-dr.10356-157072023-03-04T00:46:23Z Speaker diarization of news broacasts and meeting recordings Koh, Eugene Chin Wei Chng Eng Siong Li Haizhou School of Computer Engineering Emerging Research Lab DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Given a piece of audio recording, the task of speaker diarization can be summarized as answering the question of “Who spoke when ?”. This thesis offers a review of the techniques and issues relating to performing speaker diarization on broadcast news recordings, as well as meeting recordings. The broadcast news domain is generally regarded to be simpler because the turn taking between speakers is better controlled and audio quality tends to be higher. The typical approach used for this domain consist of two steps - speaker segmentation and then speaker clustering. The Bayesian Information Criterion (BIC) has been a very popular distance measure for both speaker segmentation and clustering. Experiments were conducted that confirmed the effectiveness of this distance measure for segmentation and clustering. Further speaker segmentation experiments were performed using the Hotelling’s T2 statistic to augment the BIC. It was observed that while this does speed up processing, the segmentation FScore obtained does not match up to that reported in the literature. A novel speaker clustering approach was also introduced where polynomial expanded feature vectors were used to compute the distance between clusters. It was found that this approach could produce results comparable to that for the BIC. In order to address the problem of speaker diarization for the meeting domain, a diarization system was developed and submitted for the NIST Rich Transcription 2007 (RT-07) evaluation. This diarization system exploited the diversity of meeting recording channels by performing Time Delay of Arrival (TDOA) estimation using a Normalized Least Means Squared (NLMS) filter. Subsequent performance enhancements were delivered by adding a cluster purification module, as well as a Non-Speech & Silence Removal (NS&SR) module. An overall Diarization Error Rate (DER) of 15.32% was obtained for the RT-07 corpus. This score was found to be competitive against the other entrants in the evaluation exercise. MASTER OF ENGINEERING (SCE) 2009-05-14T02:52:07Z 2009-05-14T02:52:07Z 2009 2009 Thesis Koh, E. C. W. (2009). Speaker diarization of news broacasts and meeting recordings. Master’s thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/15707 10.32657/10356/15707 en 128 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition
Koh, Eugene Chin Wei
Speaker diarization of news broacasts and meeting recordings
description Given a piece of audio recording, the task of speaker diarization can be summarized as answering the question of “Who spoke when ?”. This thesis offers a review of the techniques and issues relating to performing speaker diarization on broadcast news recordings, as well as meeting recordings. The broadcast news domain is generally regarded to be simpler because the turn taking between speakers is better controlled and audio quality tends to be higher. The typical approach used for this domain consist of two steps - speaker segmentation and then speaker clustering. The Bayesian Information Criterion (BIC) has been a very popular distance measure for both speaker segmentation and clustering. Experiments were conducted that confirmed the effectiveness of this distance measure for segmentation and clustering. Further speaker segmentation experiments were performed using the Hotelling’s T2 statistic to augment the BIC. It was observed that while this does speed up processing, the segmentation FScore obtained does not match up to that reported in the literature. A novel speaker clustering approach was also introduced where polynomial expanded feature vectors were used to compute the distance between clusters. It was found that this approach could produce results comparable to that for the BIC. In order to address the problem of speaker diarization for the meeting domain, a diarization system was developed and submitted for the NIST Rich Transcription 2007 (RT-07) evaluation. This diarization system exploited the diversity of meeting recording channels by performing Time Delay of Arrival (TDOA) estimation using a Normalized Least Means Squared (NLMS) filter. Subsequent performance enhancements were delivered by adding a cluster purification module, as well as a Non-Speech & Silence Removal (NS&SR) module. An overall Diarization Error Rate (DER) of 15.32% was obtained for the RT-07 corpus. This score was found to be competitive against the other entrants in the evaluation exercise.
author2 Chng Eng Siong
author_facet Chng Eng Siong
Koh, Eugene Chin Wei
format Theses and Dissertations
author Koh, Eugene Chin Wei
author_sort Koh, Eugene Chin Wei
title Speaker diarization of news broacasts and meeting recordings
title_short Speaker diarization of news broacasts and meeting recordings
title_full Speaker diarization of news broacasts and meeting recordings
title_fullStr Speaker diarization of news broacasts and meeting recordings
title_full_unstemmed Speaker diarization of news broacasts and meeting recordings
title_sort speaker diarization of news broacasts and meeting recordings
publishDate 2009
url https://hdl.handle.net/10356/15707
_version_ 1759854564344856576