Feature-based similarity method for aligning the Malay and English news documents

Corpus-based translation approach can be used to obtain reliable translation knowledge in addition to the use of dictionaries or machine translation. But the availability of such corpus is very limited especially for the low-resources languages. Many works have been reported for the alignments of mu...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nasharuddin, Nurul Amelina, Abdullah, Muhamad Taufik, Azman, Azreen, Abdul Kadir, Rabiah, Herrera-Viedma, Enrique
Format:	Article
Language:	English
Published:	IJCT Foundation 2013
Online Access:	http://psasir.upm.edu.my/id/eprint/30693/1/Feature-based%20similarity%20method%20.pdf http://psasir.upm.edu.my/id/eprint/30693/ http://cirworld.com/journals/index.php/ijct/article/view/2556
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Putra Malaysia
Language:	English

id	my.upm.eprints.30693
record_format	eprints
spelling	my.upm.eprints.306932015-12-07T03:44:36Z http://psasir.upm.edu.my/id/eprint/30693/ Feature-based similarity method for aligning the Malay and English news documents Nasharuddin, Nurul Amelina Abdullah, Muhamad Taufik Azman, Azreen Abdul Kadir, Rabiah Herrera-Viedma, Enrique Corpus-based translation approach can be used to obtain reliable translation knowledge in addition to the use of dictionaries or machine translation. But the availability of such corpus is very limited especially for the low-resources languages. Many works have been reported for the alignments of multilingual documents especially among the European languages, but less focusing on the languages with less linguistics resources. One of the challenges is to align the available multilingual documents for the creation of comparable corpus for these kinds of languages. This article describes an alignment method that utilized the statistical features of the documents such as the documents’ titles, texts of the contents, and also the named entities present in each document. This method will be focusing on the English and Malay news documents, in which in which the Malay language is considered as a low-resource language. Source and target documents were then compared in a pair. Accuracy, precision, and recall measurements were used in evaluating the results with the inclusion of three relevance scales; Same story, Shared aspect and Unrelated, to assess the alignment pairs. The results indicate that the method performed well in aligning the news documents with the accuracy of 96% and average precision of 81%. IJCT Foundation 2013-10-15 Article PeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/30693/1/Feature-based%20similarity%20method%20.pdf Nasharuddin, Nurul Amelina and Abdullah, Muhamad Taufik and Azman, Azreen and Abdul Kadir, Rabiah and Herrera-Viedma, Enrique (2013) Feature-based similarity method for aligning the Malay and English news documents. International Journal of Computers and Technology, 11 (4). pp. 2410-2421. ISSN 2277-3061 http://cirworld.com/journals/index.php/ijct/article/view/2556
institution	Universiti Putra Malaysia
building	UPM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Putra Malaysia
content_source	UPM Institutional Repository
url_provider	http://psasir.upm.edu.my/
language	English
description	Corpus-based translation approach can be used to obtain reliable translation knowledge in addition to the use of dictionaries or machine translation. But the availability of such corpus is very limited especially for the low-resources languages. Many works have been reported for the alignments of multilingual documents especially among the European languages, but less focusing on the languages with less linguistics resources. One of the challenges is to align the available multilingual documents for the creation of comparable corpus for these kinds of languages. This article describes an alignment method that utilized the statistical features of the documents such as the documents’ titles, texts of the contents, and also the named entities present in each document. This method will be focusing on the English and Malay news documents, in which in which the Malay language is considered as a low-resource language. Source and target documents were then compared in a pair. Accuracy, precision, and recall measurements were used in evaluating the results with the inclusion of three relevance scales; Same story, Shared aspect and Unrelated, to assess the alignment pairs. The results indicate that the method performed well in aligning the news documents with the accuracy of 96% and average precision of 81%.
format	Article
author	Nasharuddin, Nurul Amelina Abdullah, Muhamad Taufik Azman, Azreen Abdul Kadir, Rabiah Herrera-Viedma, Enrique
spellingShingle	Nasharuddin, Nurul Amelina Abdullah, Muhamad Taufik Azman, Azreen Abdul Kadir, Rabiah Herrera-Viedma, Enrique Feature-based similarity method for aligning the Malay and English news documents
author_facet	Nasharuddin, Nurul Amelina Abdullah, Muhamad Taufik Azman, Azreen Abdul Kadir, Rabiah Herrera-Viedma, Enrique
author_sort	Nasharuddin, Nurul Amelina
title	Feature-based similarity method for aligning the Malay and English news documents
title_short	Feature-based similarity method for aligning the Malay and English news documents
title_full	Feature-based similarity method for aligning the Malay and English news documents
title_fullStr	Feature-based similarity method for aligning the Malay and English news documents
title_full_unstemmed	Feature-based similarity method for aligning the Malay and English news documents
title_sort	feature-based similarity method for aligning the malay and english news documents
publisher	IJCT Foundation
publishDate	2013
url	http://psasir.upm.edu.my/id/eprint/30693/1/Feature-based%20similarity%20method%20.pdf http://psasir.upm.edu.my/id/eprint/30693/ http://cirworld.com/journals/index.php/ijct/article/view/2556
_version_	1643830135625875456

Feature-based similarity method for aligning the Malay and English news documents

Similar Items