Paraphrase Identification in Vietnamese Documents

In this paper, we investigate the task of paraphrase identification in Vietnamese documents, which identify whether two sentences have the same meaning. This task has been shown to be an important research dimension with practical applications in natural language processing and data mining. We c...

Full description

Saved in:
Bibliographic Details
Main Authors: Ngo, Xuan Bach, Tran, Thi Oanh, Nguyen, Trung Hai, Tu, Minh Phuong
Format: Article
Language:English
Published: IEEE 2018
Subjects:
Online Access:http://repository.vnu.edu.vn/handle/VNU_123/61163
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Vietnam National University, Hanoi
Language: English
id oai:112.137.131.14:VNU_123-61163
record_format dspace
spelling oai:112.137.131.14:VNU_123-611632018-01-29T20:03:37Z Paraphrase Identification in Vietnamese Documents Ngo, Xuan Bach Tran, Thi Oanh Nguyen, Trung Hai Tu, Minh Phuong Paraphrase Identification Semantic Similarity Support Vector Machines Maximum Entropy Model Naive Bayes Classification K-Nearest Neighbor In this paper, we investigate the task of paraphrase identification in Vietnamese documents, which identify whether two sentences have the same meaning. This task has been shown to be an important research dimension with practical applications in natural language processing and data mining. We choose to model the task as a classification problem and explore different types of features to represent sentences. We also introduce a paraphrase corpus for Vietnamese, vnPara, which consists of 3000 Vietnamese sentence pairs. We describe a series of experiments using various linguistic features and different machine learning algorithms, including Support Vector Machines, Maximum Entropy Model, Naive Bayes, and k-Nearest Neighbors. The results are promising with the best model achieving up to 90% accuracy. To the best of our knowledge, this is the first attempt to solve the task of paraphrase identification for Vietnamese. 2018-01-29T08:11:54Z 2018-01-29T08:11:54Z 2015 Article http://repository.vnu.edu.vn/handle/VNU_123/61163 en application/pdf IEEE
institution Vietnam National University, Hanoi
building VNU Library & Information Center
country Vietnam
collection VNU Digital Repository
language English
topic Paraphrase Identification
Semantic Similarity
Support Vector Machines
Maximum Entropy Model
Naive Bayes Classification
K-Nearest Neighbor
spellingShingle Paraphrase Identification
Semantic Similarity
Support Vector Machines
Maximum Entropy Model
Naive Bayes Classification
K-Nearest Neighbor
Ngo, Xuan Bach
Tran, Thi Oanh
Nguyen, Trung Hai
Tu, Minh Phuong
Paraphrase Identification in Vietnamese Documents
description In this paper, we investigate the task of paraphrase identification in Vietnamese documents, which identify whether two sentences have the same meaning. This task has been shown to be an important research dimension with practical applications in natural language processing and data mining. We choose to model the task as a classification problem and explore different types of features to represent sentences. We also introduce a paraphrase corpus for Vietnamese, vnPara, which consists of 3000 Vietnamese sentence pairs. We describe a series of experiments using various linguistic features and different machine learning algorithms, including Support Vector Machines, Maximum Entropy Model, Naive Bayes, and k-Nearest Neighbors. The results are promising with the best model achieving up to 90% accuracy. To the best of our knowledge, this is the first attempt to solve the task of paraphrase identification for Vietnamese.
format Article
author Ngo, Xuan Bach
Tran, Thi Oanh
Nguyen, Trung Hai
Tu, Minh Phuong
author_facet Ngo, Xuan Bach
Tran, Thi Oanh
Nguyen, Trung Hai
Tu, Minh Phuong
author_sort Ngo, Xuan Bach
title Paraphrase Identification in Vietnamese Documents
title_short Paraphrase Identification in Vietnamese Documents
title_full Paraphrase Identification in Vietnamese Documents
title_fullStr Paraphrase Identification in Vietnamese Documents
title_full_unstemmed Paraphrase Identification in Vietnamese Documents
title_sort paraphrase identification in vietnamese documents
publisher IEEE
publishDate 2018
url http://repository.vnu.edu.vn/handle/VNU_123/61163
_version_ 1680965778226020352