Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document
This thesis concerns a Malay-English monolingual and cross-language information retrieval system. It presents a pioneer work in the aspects that are important for the development of Malay-English information retrieval system. An improved Malay stemming algorithm has been developed to stem the var...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2006
|
Subjects: | |
Online Access: | http://psasir.upm.edu.my/id/eprint/5869/1/FSKTM_2006_1%20IR.pdf http://psasir.upm.edu.my/id/eprint/5869/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Putra Malaysia |
Language: | English |
id |
my.upm.eprints.5869 |
---|---|
record_format |
eprints |
spelling |
my.upm.eprints.58692022-01-13T02:54:28Z http://psasir.upm.edu.my/id/eprint/5869/ Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document Abdullah, Muhamad Taufik This thesis concerns a Malay-English monolingual and cross-language information retrieval system. It presents a pioneer work in the aspects that are important for the development of Malay-English information retrieval system. An improved Malay stemming algorithm has been developed to stem the various word forms into their common root for the purpose of indexing and retrieving of Malay documents. The new stemming approaches have been introduced for Malay language, namely Rules-Frequency-Order (RFO), Minimum-Rules-Frequency-Order (MRFO), Rules- Frequency-Application-Order (RFAO), and Rules-Application-Frequency-Order (RAFO). The performance of the new Malay stemming algorithm and approaches are tested using the first two chapters of the Malay translation of the Quranic documents. The results show that the new stemming algorithm and approaches are superior to the previous stemming algorithm and approach. The retrieval effectiveness of the stemming algorithm and approaches are then tested on the actual Quranic collection using vector space model and latent semantic indexing. The results show that there is an improvement in performance from non-stemmed Malay to stemmed Malay, and also from previous stemming algorithm to the new stemming algorithm. Since the employment of the new stemming algorithm and approaches achieved good performance results in Malay monolingual information retrieval, a Malay-English cross-language information retrieval experiment has been performed. The results again show that there is an improvement in performance from non-stemmed Malay to stemmed Malay, and from previous stemming algorithm to the new stemming algorithm. In addition, the results reveal that the new stemming in Malay has performed better than the English stemming in retrieving relevant document. The results can be a reference to forthcoming similar experiments and research for cross language testing of documents retrieval. 2006-02 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/5869/1/FSKTM_2006_1%20IR.pdf Abdullah, Muhamad Taufik (2006) Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document. Doctoral thesis, Universiti Putra Malaysia. Bilingualism - Malay Bilingualism - English |
institution |
Universiti Putra Malaysia |
building |
UPM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Putra Malaysia |
content_source |
UPM Institutional Repository |
url_provider |
http://psasir.upm.edu.my/ |
language |
English |
topic |
Bilingualism - Malay Bilingualism - English |
spellingShingle |
Bilingualism - Malay Bilingualism - English Abdullah, Muhamad Taufik Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document |
description |
This thesis concerns a Malay-English monolingual and cross-language information
retrieval system. It presents a pioneer work in the aspects that are important for the
development of Malay-English information retrieval system. An improved Malay
stemming algorithm has been developed to stem the various word forms into their
common root for the purpose of indexing and retrieving of Malay documents. The
new stemming approaches have been introduced for Malay language, namely Rules-Frequency-Order (RFO), Minimum-Rules-Frequency-Order (MRFO), Rules- Frequency-Application-Order (RFAO), and Rules-Application-Frequency-Order
(RAFO). The performance of the new Malay stemming algorithm and approaches are tested using the first two chapters of the Malay translation of the Quranic documents. The
results show that the new stemming algorithm and approaches are superior to the previous stemming algorithm and approach. The retrieval effectiveness of the
stemming algorithm and approaches are then tested on the actual Quranic collection
using vector space model and latent semantic indexing. The results show that there is
an improvement in performance from non-stemmed Malay to stemmed Malay, and
also from previous stemming algorithm to the new stemming algorithm.
Since the employment of the new stemming algorithm and approaches achieved good
performance results in Malay monolingual information retrieval, a Malay-English
cross-language information retrieval experiment has been performed. The results again show that there is an improvement in performance from non-stemmed Malay to
stemmed Malay, and from previous stemming algorithm to the new stemming algorithm. In addition, the results reveal that the new stemming in Malay has performed better than the English stemming in retrieving relevant document. The
results can be a reference to forthcoming similar experiments and research for cross language
testing of documents retrieval. |
format |
Thesis |
author |
Abdullah, Muhamad Taufik |
author_facet |
Abdullah, Muhamad Taufik |
author_sort |
Abdullah, Muhamad Taufik |
title |
Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document |
title_short |
Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document |
title_full |
Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document |
title_fullStr |
Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document |
title_full_unstemmed |
Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document |
title_sort |
monolingual and cross-language information retrieval approaches for malay and english language document |
publishDate |
2006 |
url |
http://psasir.upm.edu.my/id/eprint/5869/1/FSKTM_2006_1%20IR.pdf http://psasir.upm.edu.my/id/eprint/5869/ |
_version_ |
1724075263357091840 |