Developing the foundations of a Filipino wordnet

There are about 22,000,000 Filipinos who use the Filipino language, this is according to the Philippine Census of 2000 (NSO, 2000). This work describes the ways that were employed in creating/developing the foundations of a WordNet for the Filipino language. The term foundation used in this paper sp...

Full description

Saved in:
Bibliographic Details
Main Authors: Bondoc, Jeremy, Garcia, Alvin, Lacaden, John Bryan, Yu, Hun Ping
Format: text
Language:English
Published: Animo Repository 2010
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_bachelors/7654
https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=8299&context=etd_bachelors
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_bachelors-8299
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_bachelors-82992023-01-12T07:56:05Z Developing the foundations of a Filipino wordnet Bondoc, Jeremy Garcia, Alvin Lacaden, John Bryan Yu, Hun Ping There are about 22,000,000 Filipinos who use the Filipino language, this is according to the Philippine Census of 2000 (NSO, 2000). This work describes the ways that were employed in creating/developing the foundations of a WordNet for the Filipino language. The term foundation used in this paper speaks about the basic features of a WordNet. The first basic feature for the foundation of the Filipino WordNet is the secure database design which closely resembles the MySQL database design for the English WordNet. This database was modified for the project in order to facilitates in the semi-automatic creation of Filipino synsets was also developed. A tool integrated in the system is the Morphological Analyzer and Generator for Tagalog (MAGTag). A concordancer was also implemented in the system which is used to help in defining Filipino synsets through context clues. A bilingual translator is also in the system to help the user in translating an English word to Filipino. The English WordNet database is also accessed by the system which obtains the English synset/s for any English input word from the user. Finally, Sigma (Niles & Pease, 2001) which is an offline tool to obtain the Suggested Upper Merged Ontology concept of an English synset is used in the system. Each of these tools provides information which are stored in the Filipino WordNet database and also helps in producing accurate synsets. Apart from the system developed, the work of the researchers also include the creation of 10,000 Filipino synsets that will serve as the base concepts for the Filipino WordNet. The researchers had Computer Science and non-Computer Science students to evaluate the system. It can be inferred that the user interface design of the synset creation assistant is effective in guiding people in the creation of Filipino synsets based on the results gathered from the testers. In addition to these, the system also decreased the time needed for creating synsets. The system developed by the researchers is the first of its kind since other WordNet development systems do not have tools which are the MAGTag, concordancer, and bilingual translator. There were two resource persons who manually evaluated the synsets created. The results show that the synsets created are accurate synsets based on the word senses included in the set and also on the definition provided. It can be concluded that with the use of various NLP tools and also having enough knowledge about the Filipino language, the researchers were able to produce accurate synsets. A recommendation that can be made in order to improve the system is to study the unused database tables and fields to provide more information on a synsets. It is also recommended to get more corpora for the concordancer. The researchers recommend the improvement of the morphological analyzer to provide accurate results. As for the user interface, the researchers recommend the improvement of the design by making it more guided by integrating more interface features. The Filipino synsets can be further improved by avoiding the verbatim translation of an English definition to Filipino. Finally, the researchers recommend having more linguists in the evaluation of the Filipino synsets to provide more accurate synsets because these linguists have more knowledge as compared to non-linguists with regards to the language. 2010-08-30T07:00:00Z text application/pdf https://animorepository.dlsu.edu.ph/etd_bachelors/7654 https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=8299&context=etd_bachelors Bachelor's Theses English Animo Repository Filipino language WordNet (Computer program language) Computer Sciences
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Filipino language
WordNet (Computer program language)
Computer Sciences
spellingShingle Filipino language
WordNet (Computer program language)
Computer Sciences
Bondoc, Jeremy
Garcia, Alvin
Lacaden, John Bryan
Yu, Hun Ping
Developing the foundations of a Filipino wordnet
description There are about 22,000,000 Filipinos who use the Filipino language, this is according to the Philippine Census of 2000 (NSO, 2000). This work describes the ways that were employed in creating/developing the foundations of a WordNet for the Filipino language. The term foundation used in this paper speaks about the basic features of a WordNet. The first basic feature for the foundation of the Filipino WordNet is the secure database design which closely resembles the MySQL database design for the English WordNet. This database was modified for the project in order to facilitates in the semi-automatic creation of Filipino synsets was also developed. A tool integrated in the system is the Morphological Analyzer and Generator for Tagalog (MAGTag). A concordancer was also implemented in the system which is used to help in defining Filipino synsets through context clues. A bilingual translator is also in the system to help the user in translating an English word to Filipino. The English WordNet database is also accessed by the system which obtains the English synset/s for any English input word from the user. Finally, Sigma (Niles & Pease, 2001) which is an offline tool to obtain the Suggested Upper Merged Ontology concept of an English synset is used in the system. Each of these tools provides information which are stored in the Filipino WordNet database and also helps in producing accurate synsets. Apart from the system developed, the work of the researchers also include the creation of 10,000 Filipino synsets that will serve as the base concepts for the Filipino WordNet. The researchers had Computer Science and non-Computer Science students to evaluate the system. It can be inferred that the user interface design of the synset creation assistant is effective in guiding people in the creation of Filipino synsets based on the results gathered from the testers. In addition to these, the system also decreased the time needed for creating synsets. The system developed by the researchers is the first of its kind since other WordNet development systems do not have tools which are the MAGTag, concordancer, and bilingual translator. There were two resource persons who manually evaluated the synsets created. The results show that the synsets created are accurate synsets based on the word senses included in the set and also on the definition provided. It can be concluded that with the use of various NLP tools and also having enough knowledge about the Filipino language, the researchers were able to produce accurate synsets. A recommendation that can be made in order to improve the system is to study the unused database tables and fields to provide more information on a synsets. It is also recommended to get more corpora for the concordancer. The researchers recommend the improvement of the morphological analyzer to provide accurate results. As for the user interface, the researchers recommend the improvement of the design by making it more guided by integrating more interface features. The Filipino synsets can be further improved by avoiding the verbatim translation of an English definition to Filipino. Finally, the researchers recommend having more linguists in the evaluation of the Filipino synsets to provide more accurate synsets because these linguists have more knowledge as compared to non-linguists with regards to the language.
format text
author Bondoc, Jeremy
Garcia, Alvin
Lacaden, John Bryan
Yu, Hun Ping
author_facet Bondoc, Jeremy
Garcia, Alvin
Lacaden, John Bryan
Yu, Hun Ping
author_sort Bondoc, Jeremy
title Developing the foundations of a Filipino wordnet
title_short Developing the foundations of a Filipino wordnet
title_full Developing the foundations of a Filipino wordnet
title_fullStr Developing the foundations of a Filipino wordnet
title_full_unstemmed Developing the foundations of a Filipino wordnet
title_sort developing the foundations of a filipino wordnet
publisher Animo Repository
publishDate 2010
url https://animorepository.dlsu.edu.ph/etd_bachelors/7654
https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=8299&context=etd_bachelors
_version_ 1756432607251267584