Transformer-based joint learning approach for text normalization in Vietnamese Automatic Speech Recognition Systems

In this article, we investigate the task of normalizing transcribed texts in Vietnamese Automatic Speech Recognition (ASR) systems in order to improve user readability and the performance of downstream tasks. This task usually consists of two main sub-tasks: predicting and inserting punctuation (i.e...

Full description

Saved in:
Bibliographic Details
Main Authors: BUI, The Viet, LUONG, Tho Chi, TRAN, Oanh Thi
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
ASR
Online Access:https://ink.library.smu.edu.sg/sis_research/7591
https://ink.library.smu.edu.sg/context/sis_research/article/8594/viewcontent/TransformerBasedVietnameseASR_av.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8594
record_format dspace
spelling sg-smu-ink.sis_research-85942022-12-12T08:02:00Z Transformer-based joint learning approach for text normalization in Vietnamese Automatic Speech Recognition Systems BUI, The Viet LUONG, Tho Chi TRAN, Oanh Thi In this article, we investigate the task of normalizing transcribed texts in Vietnamese Automatic Speech Recognition (ASR) systems in order to improve user readability and the performance of downstream tasks. This task usually consists of two main sub-tasks: predicting and inserting punctuation (i.e., period, comma); and detecting and standardizing named entities (i.e., numbers, person names) from spoken forms to their appropriate written forms. To achieve these goals, we introduce a complete corpus including of 87,700 sentences and investigate conditional joint learning approaches which globally optimize two sub-tasks simultaneously. The experimental results are quite promising. Overall, the proposed architecture outperformed the conventional architecture which trains individual models on the two sub-tasks separately. The joint models are furthered improved when integrated with the surrounding contexts (SCs). Specifically, we obtained 81.13% for the first sub-task and 94.41% for the second sub-task in the F1 scores using the best model. 2022-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7591 info:doi/10.1080/01969722.2022.2145654 https://ink.library.smu.edu.sg/context/sis_research/article/8594/viewcontent/TransformerBasedVietnameseASR_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University ASR named entity recognition post-processing punctuator text normalization transformer-based joint learning models Numerical Analysis and Scientific Computing South and Southeast Asian Languages and Societies Theory and Algorithms
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic ASR
named entity recognition
post-processing
punctuator
text normalization
transformer-based joint learning models
Numerical Analysis and Scientific Computing
South and Southeast Asian Languages and Societies
Theory and Algorithms
spellingShingle ASR
named entity recognition
post-processing
punctuator
text normalization
transformer-based joint learning models
Numerical Analysis and Scientific Computing
South and Southeast Asian Languages and Societies
Theory and Algorithms
BUI, The Viet
LUONG, Tho Chi
TRAN, Oanh Thi
Transformer-based joint learning approach for text normalization in Vietnamese Automatic Speech Recognition Systems
description In this article, we investigate the task of normalizing transcribed texts in Vietnamese Automatic Speech Recognition (ASR) systems in order to improve user readability and the performance of downstream tasks. This task usually consists of two main sub-tasks: predicting and inserting punctuation (i.e., period, comma); and detecting and standardizing named entities (i.e., numbers, person names) from spoken forms to their appropriate written forms. To achieve these goals, we introduce a complete corpus including of 87,700 sentences and investigate conditional joint learning approaches which globally optimize two sub-tasks simultaneously. The experimental results are quite promising. Overall, the proposed architecture outperformed the conventional architecture which trains individual models on the two sub-tasks separately. The joint models are furthered improved when integrated with the surrounding contexts (SCs). Specifically, we obtained 81.13% for the first sub-task and 94.41% for the second sub-task in the F1 scores using the best model.
format text
author BUI, The Viet
LUONG, Tho Chi
TRAN, Oanh Thi
author_facet BUI, The Viet
LUONG, Tho Chi
TRAN, Oanh Thi
author_sort BUI, The Viet
title Transformer-based joint learning approach for text normalization in Vietnamese Automatic Speech Recognition Systems
title_short Transformer-based joint learning approach for text normalization in Vietnamese Automatic Speech Recognition Systems
title_full Transformer-based joint learning approach for text normalization in Vietnamese Automatic Speech Recognition Systems
title_fullStr Transformer-based joint learning approach for text normalization in Vietnamese Automatic Speech Recognition Systems
title_full_unstemmed Transformer-based joint learning approach for text normalization in Vietnamese Automatic Speech Recognition Systems
title_sort transformer-based joint learning approach for text normalization in vietnamese automatic speech recognition systems
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/7591
https://ink.library.smu.edu.sg/context/sis_research/article/8594/viewcontent/TransformerBasedVietnameseASR_av.pdf
_version_ 1770576379172093952