Paraphrase Identification in Vietnamese Documents
In this paper, we investigate the task of paraphrase identification in Vietnamese documents, which identify whether two sentences have the same meaning. This task has been shown to be an important research dimension with practical applications in natural language processing and data mining. We c...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2018
|
Subjects: | |
Online Access: | http://repository.vnu.edu.vn/handle/VNU_123/61163 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Vietnam National University, Hanoi |
Language: | English |
id |
oai:112.137.131.14:VNU_123-61163 |
---|---|
record_format |
dspace |
spelling |
oai:112.137.131.14:VNU_123-611632018-01-29T20:03:37Z Paraphrase Identification in Vietnamese Documents Ngo, Xuan Bach Tran, Thi Oanh Nguyen, Trung Hai Tu, Minh Phuong Paraphrase Identification Semantic Similarity Support Vector Machines Maximum Entropy Model Naive Bayes Classification K-Nearest Neighbor In this paper, we investigate the task of paraphrase identification in Vietnamese documents, which identify whether two sentences have the same meaning. This task has been shown to be an important research dimension with practical applications in natural language processing and data mining. We choose to model the task as a classification problem and explore different types of features to represent sentences. We also introduce a paraphrase corpus for Vietnamese, vnPara, which consists of 3000 Vietnamese sentence pairs. We describe a series of experiments using various linguistic features and different machine learning algorithms, including Support Vector Machines, Maximum Entropy Model, Naive Bayes, and k-Nearest Neighbors. The results are promising with the best model achieving up to 90% accuracy. To the best of our knowledge, this is the first attempt to solve the task of paraphrase identification for Vietnamese. 2018-01-29T08:11:54Z 2018-01-29T08:11:54Z 2015 Article http://repository.vnu.edu.vn/handle/VNU_123/61163 en application/pdf IEEE |
institution |
Vietnam National University, Hanoi |
building |
VNU Library & Information Center |
country |
Vietnam |
collection |
VNU Digital Repository |
language |
English |
topic |
Paraphrase Identification Semantic Similarity Support Vector Machines Maximum Entropy Model Naive Bayes Classification K-Nearest Neighbor |
spellingShingle |
Paraphrase Identification Semantic Similarity Support Vector Machines Maximum Entropy Model Naive Bayes Classification K-Nearest Neighbor Ngo, Xuan Bach Tran, Thi Oanh Nguyen, Trung Hai Tu, Minh Phuong Paraphrase Identification in Vietnamese Documents |
description |
In this paper, we investigate the task of paraphrase
identification in Vietnamese documents, which identify whether
two sentences have the same meaning. This task has been shown to
be an important research dimension with practical applications in
natural language processing and data mining. We choose to model
the task as a classification problem and explore different types of
features to represent sentences. We also introduce a paraphrase
corpus for Vietnamese, vnPara, which consists of 3000 Vietnamese
sentence pairs. We describe a series of experiments using various
linguistic features and different machine learning algorithms,
including Support Vector Machines, Maximum Entropy Model,
Naive Bayes, and k-Nearest Neighbors. The results are promising
with the best model achieving up to 90% accuracy. To the best
of our knowledge, this is the first attempt to solve the task of
paraphrase identification for Vietnamese. |
format |
Article |
author |
Ngo, Xuan Bach Tran, Thi Oanh Nguyen, Trung Hai Tu, Minh Phuong |
author_facet |
Ngo, Xuan Bach Tran, Thi Oanh Nguyen, Trung Hai Tu, Minh Phuong |
author_sort |
Ngo, Xuan Bach |
title |
Paraphrase Identification in Vietnamese Documents |
title_short |
Paraphrase Identification in Vietnamese Documents |
title_full |
Paraphrase Identification in Vietnamese Documents |
title_fullStr |
Paraphrase Identification in Vietnamese Documents |
title_full_unstemmed |
Paraphrase Identification in Vietnamese Documents |
title_sort |
paraphrase identification in vietnamese documents |
publisher |
IEEE |
publishDate |
2018 |
url |
http://repository.vnu.edu.vn/handle/VNU_123/61163 |
_version_ |
1680965778226020352 |