DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM

<p align="justify">Online transcription is the process of determining "who speaks what" from an audio stream containing conversation as input. It differs from offline transcription where the entirety of the conversation is already recorded. Online recognition is needed in s...

Full description

Saved in:

Bibliographic Details
Main Author:	Hardianto Satriawan - NIM: 23515053 , Cil
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/26281
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:26281
spelling	id-itb.:262812018-03-20T14:39:59ZDEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM Hardianto Satriawan - NIM: 23515053 , Cil Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/26281 <p align="justify">Online transcription is the process of determining "who speaks what" from an audio stream containing conversation as input. It differs from offline transcription where the entirety of the conversation is already recorded. Online recognition is needed in some cases where the contents and speakers in an ongoing conversation need to be recognized immediately, e.g automatic transcription of live broadcast talk shows and meetings. Online recognition is also needed for applications that require transcripts as source data for further processing such as sentiment analysis of an ongoing phone call. Here, we propose an online GMM-UBM speaker recognition system and compare it with a baseline offline system. The proposed online speaker recognition system recognizes speakers immediately after a speaker change using the Bayesian Information criterion (BIC) and Log Mel-frequency Energies (LMFE) as metrics. As a post-processing step, a rolling window of speaker segments is gathered and the time-weighted average speaker likelihoods are calculated. The highest scoring speaker within the window is then forwarded as the prediction. Speaker error rates (SER) of 25.5% and 18.5% were obtained for the proposed online system and the baseline offline system, respectively. On the other hand, online transcription system achieved a latency of 0.21 times the duration of the input segment on average, compared to 1.10 for the offline system. The online speaker recognition system was then integrated with an existing Indonesian language online speech recognition system to produce the final online transcription system. <p align="justify"> text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	<p align="justify">Online transcription is the process of determining "who speaks what" from an audio stream containing conversation as input. It differs from offline transcription where the entirety of the conversation is already recorded. Online recognition is needed in some cases where the contents and speakers in an ongoing conversation need to be recognized immediately, e.g automatic transcription of live broadcast talk shows and meetings. Online recognition is also needed for applications that require transcripts as source data for further processing such as sentiment analysis of an ongoing phone call. Here, we propose an online GMM-UBM speaker recognition system and compare it with a baseline offline system. The proposed online speaker recognition system recognizes speakers immediately after a speaker change using the Bayesian Information criterion (BIC) and Log Mel-frequency Energies (LMFE) as metrics. As a post-processing step, a rolling window of speaker segments is gathered and the time-weighted average speaker likelihoods are calculated. The highest scoring speaker within the window is then forwarded as the prediction. Speaker error rates (SER) of 25.5% and 18.5% were obtained for the proposed online system and the baseline offline system, respectively. On the other hand, online transcription system achieved a latency of 0.21 times the duration of the input segment on average, compared to 1.10 for the offline system. The online speaker recognition system was then integrated with an existing Indonesian language online speech recognition system to produce the final online transcription system. <p align="justify">
format	Theses
author	Hardianto Satriawan - NIM: 23515053 , Cil
spellingShingle	Hardianto Satriawan - NIM: 23515053 , Cil DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM
author_facet	Hardianto Satriawan - NIM: 23515053 , Cil
author_sort	Hardianto Satriawan - NIM: 23515053 , Cil
title	DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM
title_short	DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM
title_full	DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM
title_fullStr	DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM
title_full_unstemmed	DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM
title_sort	development of a gmm-ubm based online transcription system
url	https://digilib.itb.ac.id/gdl/view/26281
_version_	1823635406743142400

DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM

Similar Items