A review on building bilingual comparable corpora for resource-limited languages

Information retrieval tasks on certain Asian languages have the problem of limited knowledge resources such as the bilingual and multilingual dictionaries and corpora. Thus, there is a need to create multilingual resources for these languages. One of the ways is to automatically align document by id...

Full description

Saved in:
Bibliographic Details
Main Authors: Nasharuddin, Nurul Amelina, Abdullah, Muhamad Taufik, Azman, Azreen, Abdul Kadir, Rabiah
Format: Conference or Workshop Item
Language:English
Published: IEEE 2018
Online Access:http://psasir.upm.edu.my/id/eprint/69531/1/A%20review%20on%20building%20bilingual%20comparable%20corpora%20for%20resource-limited%20languages.pdf
http://psasir.upm.edu.my/id/eprint/69531/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
Language: English
id my.upm.eprints.69531
record_format eprints
spelling my.upm.eprints.695312020-05-25T01:50:25Z http://psasir.upm.edu.my/id/eprint/69531/ A review on building bilingual comparable corpora for resource-limited languages Nasharuddin, Nurul Amelina Abdullah, Muhamad Taufik Azman, Azreen Abdul Kadir, Rabiah Information retrieval tasks on certain Asian languages have the problem of limited knowledge resources such as the bilingual and multilingual dictionaries and corpora. Thus, there is a need to create multilingual resources for these languages. One of the ways is to automatically align document by identifying the chances that two documents are related to each other and these documents are not necessarily in one language. Multilingual corpora can then be automatically developed from these aligned documents. Numerous approaches for document alignment have been developed to date. In this paper, we gave an overview of recent progress made for bilingual and multilingual document alignments within the last 5 years. In addition, we also discussed the current progress made in developing bilingual comparable corpus especially on the Malay language, which is one of the resource-limited languages in Asia. IEEE 2018 Conference or Workshop Item PeerReviewed text en http://psasir.upm.edu.my/id/eprint/69531/1/A%20review%20on%20building%20bilingual%20comparable%20corpora%20for%20resource-limited%20languages.pdf Nasharuddin, Nurul Amelina and Abdullah, Muhamad Taufik and Azman, Azreen and Abdul Kadir, Rabiah (2018) A review on building bilingual comparable corpora for resource-limited languages. In: 2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP'18), 26-28 Mar. 2018, Le Méridien Kota Kinabalu, Sabah, Malaysia. (pp. 113-118). 10.1109/INFRKM.2018.8464798
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
description Information retrieval tasks on certain Asian languages have the problem of limited knowledge resources such as the bilingual and multilingual dictionaries and corpora. Thus, there is a need to create multilingual resources for these languages. One of the ways is to automatically align document by identifying the chances that two documents are related to each other and these documents are not necessarily in one language. Multilingual corpora can then be automatically developed from these aligned documents. Numerous approaches for document alignment have been developed to date. In this paper, we gave an overview of recent progress made for bilingual and multilingual document alignments within the last 5 years. In addition, we also discussed the current progress made in developing bilingual comparable corpus especially on the Malay language, which is one of the resource-limited languages in Asia.
format Conference or Workshop Item
author Nasharuddin, Nurul Amelina
Abdullah, Muhamad Taufik
Azman, Azreen
Abdul Kadir, Rabiah
spellingShingle Nasharuddin, Nurul Amelina
Abdullah, Muhamad Taufik
Azman, Azreen
Abdul Kadir, Rabiah
A review on building bilingual comparable corpora for resource-limited languages
author_facet Nasharuddin, Nurul Amelina
Abdullah, Muhamad Taufik
Azman, Azreen
Abdul Kadir, Rabiah
author_sort Nasharuddin, Nurul Amelina
title A review on building bilingual comparable corpora for resource-limited languages
title_short A review on building bilingual comparable corpora for resource-limited languages
title_full A review on building bilingual comparable corpora for resource-limited languages
title_fullStr A review on building bilingual comparable corpora for resource-limited languages
title_full_unstemmed A review on building bilingual comparable corpora for resource-limited languages
title_sort review on building bilingual comparable corpora for resource-limited languages
publisher IEEE
publishDate 2018
url http://psasir.upm.edu.my/id/eprint/69531/1/A%20review%20on%20building%20bilingual%20comparable%20corpora%20for%20resource-limited%20languages.pdf
http://psasir.upm.edu.my/id/eprint/69531/
_version_ 1669008806649004032