PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES

Automatic Speech Recognition (ASR) systems provide output in the form of speech recognition text. This text is generally not punctuated (Ostendorf et al., 2008). The formatting of speech recognition results is important for both humans and machines, because it can eliminate the ambiguity of meani...

Full description

Saved in:

Bibliographic Details
Main Author:	Irfaan Dzakiy, M.
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/72145
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:72145
spelling	id-itb.:721452023-03-06T09:57:02ZPUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES Irfaan Dzakiy, M. Indonesia Final Project automatic speech recognition results, transcript formatting, punctuation prediction, conditional random fields, neural machine translation. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/72145 Automatic Speech Recognition (ASR) systems provide output in the form of speech recognition text. This text is generally not punctuated (Ostendorf et al., 2008). The formatting of speech recognition results is important for both humans and machines, because it can eliminate the ambiguity of meaning in sentences, and can be used in various NLP tasks. This research intends to add periods, commas, and question marks to the speech recognition system results. Punctuation prediction can be done using Language Modeling, Sequence Labelling, and Machine Translation approaches. The best F1 score from previous research was obtained from the Sequence Labelling and Machine Translation approaches. The sequence labelling approach uses the Conditional Random Fields model with various word range and n_gram configurations (Lu and Ng, 2010). The machine translation approach uses the Neural Machine Translation model with RNN, Bi-RNN, CNN, and Transformer encoder algorithms, as well as RNN, CNN, and Transformer decoder algorithms (Vandeghinste et al., 2018). The Indo4B corpus and text data from YouTube automatic captions were used in this study. This research also tested the best sampling technique in overcoming the imbalance in the number of punctuation marks in the dataset. Experiments were conducted by changing the sampling method and the architecture configuration used to obtain the best configuration. Based on the experiments conducted, the best sampling method is the Random Undersampling method, which produces a dataset with a balanced distribution of punctuation marks. The best model obtained is the CRF model with a configuration of word range 6 and n_gram 3. The best F-measure for the model is: 78.69% for periods; 40.30% for commas; and 81.54% for question marks. In addition, various variations of f1 score for ASR recognition were simulated. The best F-measure is obtained from simulating ASR with 100% f1 score with the best CRF model, namely: 66.59% for periods; 20.75% for commas; and 40.36% for question marks. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Automatic Speech Recognition (ASR) systems provide output in the form of speech recognition text. This text is generally not punctuated (Ostendorf et al., 2008). The formatting of speech recognition results is important for both humans and machines, because it can eliminate the ambiguity of meaning in sentences, and can be used in various NLP tasks. This research intends to add periods, commas, and question marks to the speech recognition system results. Punctuation prediction can be done using Language Modeling, Sequence Labelling, and Machine Translation approaches. The best F1 score from previous research was obtained from the Sequence Labelling and Machine Translation approaches. The sequence labelling approach uses the Conditional Random Fields model with various word range and n_gram configurations (Lu and Ng, 2010). The machine translation approach uses the Neural Machine Translation model with RNN, Bi-RNN, CNN, and Transformer encoder algorithms, as well as RNN, CNN, and Transformer decoder algorithms (Vandeghinste et al., 2018). The Indo4B corpus and text data from YouTube automatic captions were used in this study. This research also tested the best sampling technique in overcoming the imbalance in the number of punctuation marks in the dataset. Experiments were conducted by changing the sampling method and the architecture configuration used to obtain the best configuration. Based on the experiments conducted, the best sampling method is the Random Undersampling method, which produces a dataset with a balanced distribution of punctuation marks. The best model obtained is the CRF model with a configuration of word range 6 and n_gram 3. The best F-measure for the model is: 78.69% for periods; 40.30% for commas; and 81.54% for question marks. In addition, various variations of f1 score for ASR recognition were simulated. The best F-measure is obtained from simulating ASR with 100% f1 score with the best CRF model, namely: 66.59% for periods; 20.75% for commas; and 40.36% for question marks.
format	Final Project
author	Irfaan Dzakiy, M.
spellingShingle	Irfaan Dzakiy, M. PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES
author_facet	Irfaan Dzakiy, M.
author_sort	Irfaan Dzakiy, M.
title	PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES
title_short	PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES
title_full	PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES
title_fullStr	PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES
title_full_unstemmed	PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES
title_sort	punctuation prediction in automatic speech recognition system results using sequence labelling and machine translation-based approaches
url	https://digilib.itb.ac.id/gdl/view/72145
_version_	1822992451389882368

PUNCTUATION PREDICTION IN AUTOMATIC SPEECH RECOGNITION SYSTEM RESULTS USING SEQUENCE LABELLING AND MACHINE TRANSLATION-BASED APPROACHES

Similar Items