DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM
<p align="justify">Online transcription is the process of determining "who speaks what" from an audio stream containing conversation as input. It differs from offline transcription where the entirety of the conversation is already recorded. Online recognition is needed in s...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/26281 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:26281 |
---|---|
spelling |
id-itb.:262812018-03-20T14:39:59ZDEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM Hardianto Satriawan - NIM: 23515053 , Cil Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/26281 <p align="justify">Online transcription is the process of determining "who speaks what" from an audio stream containing conversation as input. It differs from offline transcription where the entirety of the conversation is already recorded. Online recognition is needed in some cases where the contents and speakers in an ongoing conversation need to be recognized immediately, e.g automatic transcription of live broadcast talk shows and meetings. Online recognition is also needed for applications that require transcripts as source data for further processing such as sentiment analysis of an ongoing phone call. Here, we propose an online GMM-UBM speaker recognition system and compare it with a baseline offline system. The proposed online speaker recognition system recognizes speakers immediately after a speaker change using the Bayesian Information criterion (BIC) and Log Mel-frequency Energies (LMFE) as metrics. As a post-processing step, a rolling window of speaker segments is gathered and the time-weighted average speaker likelihoods are calculated. The highest scoring speaker within the window is then forwarded as the prediction. Speaker error rates (SER) of 25.5% and 18.5% were obtained for the proposed online system and the baseline offline system, respectively. On the other hand, online transcription system achieved a latency of 0.21 times the duration of the input segment on average, compared to 1.10 for the offline system. The online speaker recognition system was then integrated with an existing Indonesian language online speech recognition system to produce the final online transcription system. <p align="justify"> text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
<p align="justify">Online transcription is the process of determining "who speaks what" from an audio stream containing conversation as input. It differs from offline transcription where the entirety of the conversation is already recorded. Online recognition is needed in some cases where the contents and speakers in an ongoing conversation need to be recognized immediately, e.g automatic transcription of live broadcast talk shows and meetings. Online recognition is also needed for applications that require transcripts as source data for further processing such as sentiment analysis of an ongoing phone call. Here, we propose an online GMM-UBM speaker recognition system and compare it with a baseline offline system. The proposed online speaker recognition system recognizes speakers immediately after a speaker change using the Bayesian Information criterion (BIC) and Log Mel-frequency Energies (LMFE) as metrics. As a post-processing step, a rolling window of speaker segments is gathered and the time-weighted average speaker likelihoods are calculated. The highest scoring speaker within the window is then forwarded as the prediction. Speaker error rates (SER) of 25.5% and 18.5% were obtained for the proposed online system and the baseline offline system, respectively. On the other hand, online transcription system achieved a latency of 0.21 times the duration of the input segment on average, compared to 1.10 for the offline system. The online speaker recognition system was then integrated with an existing Indonesian language online speech recognition system to produce the final online transcription system. <p align="justify"> |
format |
Theses |
author |
Hardianto Satriawan - NIM: 23515053 , Cil |
spellingShingle |
Hardianto Satriawan - NIM: 23515053 , Cil DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM |
author_facet |
Hardianto Satriawan - NIM: 23515053 , Cil |
author_sort |
Hardianto Satriawan - NIM: 23515053 , Cil |
title |
DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM |
title_short |
DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM |
title_full |
DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM |
title_fullStr |
DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM |
title_full_unstemmed |
DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM |
title_sort |
development of a gmm-ubm based online transcription system |
url |
https://digilib.itb.ac.id/gdl/view/26281 |
_version_ |
1822921845167357952 |