An efficient transformer-based model for Vietnamese punctuation prediction

In both formal and informal texts, missing punctuation marks make the texts confusing and challenging to read. This paper aims to conduct exhaustive experiments to investigate the benefits of the pre-trained Transformer-based models on two Vietnamese punctuation datasets. The experimental results sh...

Full description

Saved in:
Bibliographic Details
Main Authors: TRAN, Hieu, DINH, Cuong V., PHAM, Hong Quang, NGUYEN, Binh T.
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7102
https://ink.library.smu.edu.sg/context/sis_research/article/8105/viewcontent/Tran2021_AnEfficientTransformer_Vietnamese_pv.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:In both formal and informal texts, missing punctuation marks make the texts confusing and challenging to read. This paper aims to conduct exhaustive experiments to investigate the benefits of the pre-trained Transformer-based models on two Vietnamese punctuation datasets. The experimental results show our models can achieve encouraging results, and adding Bi-LSTM or/and CRF layers on top of the proposed models can also boost model performance. Finally, our best model can significantly bypass state-of-the-art approaches on both the novel and news datasets for the Vietnamese language. It can gain the corresponding performance up to 21.45%21.45% and 18.27%18.27% in the overall F1-scores.