Developing the foundations of a Filipino wordnet

There are about 22,000,000 Filipinos who use the Filipino language, this is according to the Philippine Census of 2000 (NSO, 2000). This work describes the ways that were employed in creating/developing the foundations of a WordNet for the Filipino language. The term foundation used in this paper sp...

Full description

Saved in:
Bibliographic Details
Main Authors: Bondoc, Jeremy, Garcia, Alvin, Lacaden, John Bryan, Yu, Hun Ping
Format: text
Language:English
Published: Animo Repository 2010
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_bachelors/7654
https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=8299&context=etd_bachelors
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:There are about 22,000,000 Filipinos who use the Filipino language, this is according to the Philippine Census of 2000 (NSO, 2000). This work describes the ways that were employed in creating/developing the foundations of a WordNet for the Filipino language. The term foundation used in this paper speaks about the basic features of a WordNet. The first basic feature for the foundation of the Filipino WordNet is the secure database design which closely resembles the MySQL database design for the English WordNet. This database was modified for the project in order to facilitates in the semi-automatic creation of Filipino synsets was also developed. A tool integrated in the system is the Morphological Analyzer and Generator for Tagalog (MAGTag). A concordancer was also implemented in the system which is used to help in defining Filipino synsets through context clues. A bilingual translator is also in the system to help the user in translating an English word to Filipino. The English WordNet database is also accessed by the system which obtains the English synset/s for any English input word from the user. Finally, Sigma (Niles & Pease, 2001) which is an offline tool to obtain the Suggested Upper Merged Ontology concept of an English synset is used in the system. Each of these tools provides information which are stored in the Filipino WordNet database and also helps in producing accurate synsets. Apart from the system developed, the work of the researchers also include the creation of 10,000 Filipino synsets that will serve as the base concepts for the Filipino WordNet. The researchers had Computer Science and non-Computer Science students to evaluate the system. It can be inferred that the user interface design of the synset creation assistant is effective in guiding people in the creation of Filipino synsets based on the results gathered from the testers. In addition to these, the system also decreased the time needed for creating synsets. The system developed by the researchers is the first of its kind since other WordNet development systems do not have tools which are the MAGTag, concordancer, and bilingual translator. There were two resource persons who manually evaluated the synsets created. The results show that the synsets created are accurate synsets based on the word senses included in the set and also on the definition provided. It can be concluded that with the use of various NLP tools and also having enough knowledge about the Filipino language, the researchers were able to produce accurate synsets. A recommendation that can be made in order to improve the system is to study the unused database tables and fields to provide more information on a synsets. It is also recommended to get more corpora for the concordancer. The researchers recommend the improvement of the morphological analyzer to provide accurate results. As for the user interface, the researchers recommend the improvement of the design by making it more guided by integrating more interface features. The Filipino synsets can be further improved by avoiding the verbatim translation of an English definition to Filipino. Finally, the researchers recommend having more linguists in the evaluation of the Filipino synsets to provide more accurate synsets because these linguists have more knowledge as compared to non-linguists with regards to the language.