Equivalent Malay-Arabic data corpus collection

This paper aims to introduce a search strategy and collecting comparable sentences of Arab-Malay corpus data. This method was introduced for the use of students, researchers and amateur translators to search and compare the structure of sentences in Arabic and Malay. The first stage is to collect da...

Full description

Saved in:
Bibliographic Details
Main Authors: Muhamad Romli, Taj Rijal, Hassan Azhari, Abd Rauf, Mohamad, Hasnah
Format: Article
Language:English
Published: European Center for Science Education and Research 2016
Online Access:http://psasir.upm.edu.my/id/eprint/54182/1/Equivalent%20Malay-Arabic%20data%20corpus%20collection.pdf
http://psasir.upm.edu.my/id/eprint/54182/
http://journals.euser.org/index.php/ejls/article/view/607
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
Language: English
id my.upm.eprints.54182
record_format eprints
spelling my.upm.eprints.541822018-03-02T03:40:27Z http://psasir.upm.edu.my/id/eprint/54182/ Equivalent Malay-Arabic data corpus collection Muhamad Romli, Taj Rijal Hassan Azhari, Abd Rauf Mohamad, Hasnah This paper aims to introduce a search strategy and collecting comparable sentences of Arab-Malay corpus data. This method was introduced for the use of students, researchers and amateur translators to search and compare the structure of sentences in Arabic and Malay. The first stage is to collect data corpus with high impact titles from the press and must be able to enlarge the scope of study as stated by Maia (2003). The second stage is to search using the specified key words based on selected high-impact titles such as the Football World Cup year 2010 and 2014. Data search is by using Webcorp engine http://www.webcorp.org.uk/live/ corpus and also open database Google https://www.google.com. The third stage is to filter the data by using Aker et.al (2012) and Braschler's (1998) method based on similar story, related story and similar aspects. At the fourth stage every category is measured by Guidere's (2002) equivalence strength which is strong comparability (SC), medium (MC) and weak (WC). At the last stage comparable sentences between the two languages are compiled in parallel according to Mona Baker’s (1992) level of grouping which are sentence level, combination of words, grammatical, pragmatic and textual level. The result from data analysis based on Mona Baker and Vinay - Darbelnet’s (1995) comparable theory proved the existence of some sentences in large quantities are on the same level of comparability from the point of information delivery. This can be used as the basis of additional evidence concerning the validity of 'universal theory.' in the science of translation. European Center for Science Education and Research 2016 Article PeerReviewed text en http://psasir.upm.edu.my/id/eprint/54182/1/Equivalent%20Malay-Arabic%20data%20corpus%20collection.pdf Muhamad Romli, Taj Rijal and Hassan Azhari, Abd Rauf and Mohamad, Hasnah (2016) Equivalent Malay-Arabic data corpus collection. European Journal of Language and Literature Studies, 4 (1). pp. 65-73. ISSN 2411-9598; ESSN: 2411-4103 http://journals.euser.org/index.php/ejls/article/view/607 10.26417/ejls.v4i1.p65-73
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
description This paper aims to introduce a search strategy and collecting comparable sentences of Arab-Malay corpus data. This method was introduced for the use of students, researchers and amateur translators to search and compare the structure of sentences in Arabic and Malay. The first stage is to collect data corpus with high impact titles from the press and must be able to enlarge the scope of study as stated by Maia (2003). The second stage is to search using the specified key words based on selected high-impact titles such as the Football World Cup year 2010 and 2014. Data search is by using Webcorp engine http://www.webcorp.org.uk/live/ corpus and also open database Google https://www.google.com. The third stage is to filter the data by using Aker et.al (2012) and Braschler's (1998) method based on similar story, related story and similar aspects. At the fourth stage every category is measured by Guidere's (2002) equivalence strength which is strong comparability (SC), medium (MC) and weak (WC). At the last stage comparable sentences between the two languages are compiled in parallel according to Mona Baker’s (1992) level of grouping which are sentence level, combination of words, grammatical, pragmatic and textual level. The result from data analysis based on Mona Baker and Vinay - Darbelnet’s (1995) comparable theory proved the existence of some sentences in large quantities are on the same level of comparability from the point of information delivery. This can be used as the basis of additional evidence concerning the validity of 'universal theory.' in the science of translation.
format Article
author Muhamad Romli, Taj Rijal
Hassan Azhari, Abd Rauf
Mohamad, Hasnah
spellingShingle Muhamad Romli, Taj Rijal
Hassan Azhari, Abd Rauf
Mohamad, Hasnah
Equivalent Malay-Arabic data corpus collection
author_facet Muhamad Romli, Taj Rijal
Hassan Azhari, Abd Rauf
Mohamad, Hasnah
author_sort Muhamad Romli, Taj Rijal
title Equivalent Malay-Arabic data corpus collection
title_short Equivalent Malay-Arabic data corpus collection
title_full Equivalent Malay-Arabic data corpus collection
title_fullStr Equivalent Malay-Arabic data corpus collection
title_full_unstemmed Equivalent Malay-Arabic data corpus collection
title_sort equivalent malay-arabic data corpus collection
publisher European Center for Science Education and Research
publishDate 2016
url http://psasir.upm.edu.my/id/eprint/54182/1/Equivalent%20Malay-Arabic%20data%20corpus%20collection.pdf
http://psasir.upm.edu.my/id/eprint/54182/
http://journals.euser.org/index.php/ejls/article/view/607
_version_ 1643835583387140096