DEVELOPMENT OF A GMM-UBM BASED ONLINE TRANSCRIPTION SYSTEM

<p align="justify">Online transcription is the process of determining "who speaks what" from an audio stream containing conversation as input. It differs from offline transcription where the entirety of the conversation is already recorded. Online recognition is needed in s...

Full description

Saved in:
Bibliographic Details
Main Author: Hardianto Satriawan - NIM: 23515053 , Cil
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/26281
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:<p align="justify">Online transcription is the process of determining "who speaks what" from an audio stream containing conversation as input. It differs from offline transcription where the entirety of the conversation is already recorded. Online recognition is needed in some cases where the contents and speakers in an ongoing conversation need to be recognized immediately, e.g automatic transcription of live broadcast talk shows and meetings. Online recognition is also needed for applications that require transcripts as source data for further processing such as sentiment analysis of an ongoing phone call. Here, we propose an online GMM-UBM speaker recognition system and compare it with a baseline offline system. The proposed online speaker recognition system recognizes speakers immediately after a speaker change using the Bayesian Information criterion (BIC) and Log Mel-frequency Energies (LMFE) as metrics. As a post-processing step, a rolling window of speaker segments is gathered and the time-weighted average speaker likelihoods are calculated. The highest scoring speaker within the window is then forwarded as the prediction. Speaker error rates (SER) of 25.5% and 18.5% were obtained for the proposed online system and the baseline offline system, respectively. On the other hand, online transcription system achieved a latency of 0.21 times the duration of the input segment on average, compared to 1.10 for the offline system. The online speaker recognition system was then integrated with an existing Indonesian language online speech recognition system to produce the final online transcription system. <p align="justify">