PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES

Automatic Speech Recognition (ASR) systems provide output in the form of speech recognition text. This text is generally not punctuated (Ostendorf et al., 2008). The formatting of speech recognition results is important for both humans and machines, because it can eliminate the ambiguity of meani...

Full description

Saved in:
Bibliographic Details
Main Author: Irfaan Dzakiy, M.
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/72145
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:72145
spelling id-itb.:721452023-03-06T09:57:02ZPUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES Irfaan Dzakiy, M. Indonesia Final Project automatic speech recognition results, transcript formatting, punctuation prediction, conditional random fields, neural machine translation. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/72145 Automatic Speech Recognition (ASR) systems provide output in the form of speech recognition text. This text is generally not punctuated (Ostendorf et al., 2008). The formatting of speech recognition results is important for both humans and machines, because it can eliminate the ambiguity of meaning in sentences, and can be used in various NLP tasks. This research intends to add periods, commas, and question marks to the speech recognition system results. Punctuation prediction can be done using Language Modeling, Sequence Labelling, and Machine Translation approaches. The best F1 score from previous research was obtained from the Sequence Labelling and Machine Translation approaches. The sequence labelling approach uses the Conditional Random Fields model with various word range and n_gram configurations (Lu and Ng, 2010). The machine translation approach uses the Neural Machine Translation model with RNN, Bi-RNN, CNN, and Transformer encoder algorithms, as well as RNN, CNN, and Transformer decoder algorithms (Vandeghinste et al., 2018). The Indo4B corpus and text data from YouTube automatic captions were used in this study. This research also tested the best sampling technique in overcoming the imbalance in the number of punctuation marks in the dataset. Experiments were conducted by changing the sampling method and the architecture configuration used to obtain the best configuration. Based on the experiments conducted, the best sampling method is the Random Undersampling method, which produces a dataset with a balanced distribution of punctuation marks. The best model obtained is the CRF model with a configuration of word range 6 and n_gram 3. The best F-measure for the model is: 78.69% for periods; 40.30% for commas; and 81.54% for question marks. In addition, various variations of f1 score for ASR recognition were simulated. The best F-measure is obtained from simulating ASR with 100% f1 score with the best CRF model, namely: 66.59% for periods; 20.75% for commas; and 40.36% for question marks. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Automatic Speech Recognition (ASR) systems provide output in the form of speech recognition text. This text is generally not punctuated (Ostendorf et al., 2008). The formatting of speech recognition results is important for both humans and machines, because it can eliminate the ambiguity of meaning in sentences, and can be used in various NLP tasks. This research intends to add periods, commas, and question marks to the speech recognition system results. Punctuation prediction can be done using Language Modeling, Sequence Labelling, and Machine Translation approaches. The best F1 score from previous research was obtained from the Sequence Labelling and Machine Translation approaches. The sequence labelling approach uses the Conditional Random Fields model with various word range and n_gram configurations (Lu and Ng, 2010). The machine translation approach uses the Neural Machine Translation model with RNN, Bi-RNN, CNN, and Transformer encoder algorithms, as well as RNN, CNN, and Transformer decoder algorithms (Vandeghinste et al., 2018). The Indo4B corpus and text data from YouTube automatic captions were used in this study. This research also tested the best sampling technique in overcoming the imbalance in the number of punctuation marks in the dataset. Experiments were conducted by changing the sampling method and the architecture configuration used to obtain the best configuration. Based on the experiments conducted, the best sampling method is the Random Undersampling method, which produces a dataset with a balanced distribution of punctuation marks. The best model obtained is the CRF model with a configuration of word range 6 and n_gram 3. The best F-measure for the model is: 78.69% for periods; 40.30% for commas; and 81.54% for question marks. In addition, various variations of f1 score for ASR recognition were simulated. The best F-measure is obtained from simulating ASR with 100% f1 score with the best CRF model, namely: 66.59% for periods; 20.75% for commas; and 40.36% for question marks.
format Final Project
author Irfaan Dzakiy, M.
spellingShingle Irfaan Dzakiy, M.
PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES
author_facet Irfaan Dzakiy, M.
author_sort Irfaan Dzakiy, M.
title PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES
title_short PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES
title_full PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES
title_fullStr PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES
title_full_unstemmed PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES
title_sort punctuation prediction in automatic speech recognition system results using sequence labelling and machine translation-based approaches
url https://digilib.itb.ac.id/gdl/view/72145
_version_ 1822992451389882368