Punctuation prediction for Vietnamese texts using conditional random fields

We investigate the punctuation prediction for the Vietnamese language. This problem is crucial as it can be used to add suitable punctuation marks to machine-transcribed speeches, which usually do not have such information. Similar to previous works for English and Chinese languages, we formulate th...

Full description

Saved in:
Bibliographic Details
Main Authors: PHAM, Hong Quang, NGUYEN, Binh T., CUONG, Nguyen Viet
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2019
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7816
https://ink.library.smu.edu.sg/context/sis_research/article/8819/viewcontent/3368926.3369716_pv.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8819
record_format dspace
spelling sg-smu-ink.sis_research-88192023-04-25T06:13:15Z Punctuation prediction for Vietnamese texts using conditional random fields PHAM, Hong Quang NGUYEN, Binh T. CUONG, Nguyen Viet We investigate the punctuation prediction for the Vietnamese language. This problem is crucial as it can be used to add suitable punctuation marks to machine-transcribed speeches, which usually do not have such information. Similar to previous works for English and Chinese languages, we formulate this task as a sequence labeling problem. After that, we apply the conditional random field model for solving the problem and propose a set of appropriate features that are useful for prediction. Moreover, we build two corpora from Vietnamese online news and movie subtitles and perform extensive experiments on these data. Finally, we ask four volunteers to insert punctuations into a small sample of our dataset. The experimental results show that this problem is challenging, even for a human, and our model can achieve near performance in comparison to a human. 2019-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7816 info:doi/10.1145/3368926.3369716 https://ink.library.smu.edu.sg/context/sis_research/article/8819/viewcontent/3368926.3369716_pv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Conditional random field Punctuation prediction Sequence labeling Vietnamese language Numerical Analysis and Scientific Computing South and Southeast Asian Languages and Societies
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Conditional random field
Punctuation prediction
Sequence labeling
Vietnamese language
Numerical Analysis and Scientific Computing
South and Southeast Asian Languages and Societies
spellingShingle Conditional random field
Punctuation prediction
Sequence labeling
Vietnamese language
Numerical Analysis and Scientific Computing
South and Southeast Asian Languages and Societies
PHAM, Hong Quang
NGUYEN, Binh T.
CUONG, Nguyen Viet
Punctuation prediction for Vietnamese texts using conditional random fields
description We investigate the punctuation prediction for the Vietnamese language. This problem is crucial as it can be used to add suitable punctuation marks to machine-transcribed speeches, which usually do not have such information. Similar to previous works for English and Chinese languages, we formulate this task as a sequence labeling problem. After that, we apply the conditional random field model for solving the problem and propose a set of appropriate features that are useful for prediction. Moreover, we build two corpora from Vietnamese online news and movie subtitles and perform extensive experiments on these data. Finally, we ask four volunteers to insert punctuations into a small sample of our dataset. The experimental results show that this problem is challenging, even for a human, and our model can achieve near performance in comparison to a human.
format text
author PHAM, Hong Quang
NGUYEN, Binh T.
CUONG, Nguyen Viet
author_facet PHAM, Hong Quang
NGUYEN, Binh T.
CUONG, Nguyen Viet
author_sort PHAM, Hong Quang
title Punctuation prediction for Vietnamese texts using conditional random fields
title_short Punctuation prediction for Vietnamese texts using conditional random fields
title_full Punctuation prediction for Vietnamese texts using conditional random fields
title_fullStr Punctuation prediction for Vietnamese texts using conditional random fields
title_full_unstemmed Punctuation prediction for Vietnamese texts using conditional random fields
title_sort punctuation prediction for vietnamese texts using conditional random fields
publisher Institutional Knowledge at Singapore Management University
publishDate 2019
url https://ink.library.smu.edu.sg/sis_research/7816
https://ink.library.smu.edu.sg/context/sis_research/article/8819/viewcontent/3368926.3369716_pv.pdf
_version_ 1770576518810959872