Probabilistic approach to constituent structure induction for Filipino
Grammar formalisms and parse systems are functional resources to high-level natural language processing applications. Filipino does not have extensive computational representation of the languages grammar and lacks a broad parsing mechanism. Computational approaches to automatic grammar development...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2008
|
Subjects: | |
Online Access: | https://animorepository.dlsu.edu.ph/etd_masteral/3701 https://animorepository.dlsu.edu.ph/context/etd_masteral/article/10539/viewcontent/CDTG004414_P.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
id |
oai:animorepository.dlsu.edu.ph:etd_masteral-10539 |
---|---|
record_format |
eprints |
spelling |
oai:animorepository.dlsu.edu.ph:etd_masteral-105392022-06-25T03:49:50Z Probabilistic approach to constituent structure induction for Filipino Alcantara, Danniel L. Grammar formalisms and parse systems are functional resources to high-level natural language processing applications. Filipino does not have extensive computational representation of the languages grammar and lacks a broad parsing mechanism. Computational approaches to automatic grammar development fall under either supervised and unsupervised categories. Supervised methods have produced better results, but require Tree banks or bracketed corpora as input. However, there are currently no computational resources available for Filipino to satisfy the input requirements of supervised generation. Unsupervised approaches make use of statistical and probabilistic data in order to estimate the structure of the input language. This research develops an unsupervised grammar induction system for the Filipino Language, focusing on the constituent structure. Three models are presented to handle the distribution and substitutability of constituents. The models were evaluated using 1264 sentences of length 1-10. Experimentation done on the Selection Model showed that the occurrence of a sequence is the most effective measurement for identifying constituency. The free word order phenomenon of the Filipino language was highlighted by the substitutable constituents learned by the Greedy Merge Model. The Constituent Context Model, which produced the highest ratings of the three, achieved values of 66.8% precision, 72.6% recall, and 69.5% overall measure. The produced results are comparable to existing unsupervised parse induction systems, despite the fact that the training corpus used is a fraction of the size applied by existing works. The models can not handle the dependency between words and phrases properly, and it is recommended to address dependency to further improve performance. 2008-01-01T08:00:00Z text application/pdf https://animorepository.dlsu.edu.ph/etd_masteral/3701 https://animorepository.dlsu.edu.ph/context/etd_masteral/article/10539/viewcontent/CDTG004414_P.pdf Master's Theses English Animo Repository Natural language processing (Computer science) Computational linguistics Constituent structure grammar Phrase structure grammar Computer Sciences |
institution |
De La Salle University |
building |
De La Salle University Library |
continent |
Asia |
country |
Philippines Philippines |
content_provider |
De La Salle University Library |
collection |
DLSU Institutional Repository |
language |
English |
topic |
Natural language processing (Computer science) Computational linguistics Constituent structure grammar Phrase structure grammar Computer Sciences |
spellingShingle |
Natural language processing (Computer science) Computational linguistics Constituent structure grammar Phrase structure grammar Computer Sciences Alcantara, Danniel L. Probabilistic approach to constituent structure induction for Filipino |
description |
Grammar formalisms and parse systems are functional resources to high-level natural language processing applications. Filipino does not have extensive computational representation of the languages grammar and lacks a broad parsing mechanism. Computational approaches to automatic grammar development fall under either supervised and unsupervised categories. Supervised methods have produced better results, but require Tree banks or bracketed corpora as input. However, there are currently no computational resources available for Filipino to satisfy the input requirements of supervised generation. Unsupervised approaches make use of statistical and probabilistic data in order to estimate the structure of the input language. This research develops an unsupervised grammar induction system for the Filipino Language, focusing on the constituent structure. Three models are presented to handle the distribution and substitutability of constituents. The models were evaluated using 1264 sentences of length 1-10. Experimentation done on the Selection Model showed that the occurrence of a sequence is the most effective measurement for identifying constituency. The free word order phenomenon of the Filipino language was highlighted by the substitutable constituents learned by the Greedy Merge Model. The Constituent Context Model, which produced the highest ratings of the three, achieved values of 66.8% precision, 72.6% recall, and 69.5% overall measure. The produced results are comparable to existing unsupervised parse induction systems, despite the fact that the training corpus used is a fraction of the size applied by existing works. The models can not handle the dependency between words and phrases properly, and it is recommended to address dependency to further improve performance. |
format |
text |
author |
Alcantara, Danniel L. |
author_facet |
Alcantara, Danniel L. |
author_sort |
Alcantara, Danniel L. |
title |
Probabilistic approach to constituent structure induction for Filipino |
title_short |
Probabilistic approach to constituent structure induction for Filipino |
title_full |
Probabilistic approach to constituent structure induction for Filipino |
title_fullStr |
Probabilistic approach to constituent structure induction for Filipino |
title_full_unstemmed |
Probabilistic approach to constituent structure induction for Filipino |
title_sort |
probabilistic approach to constituent structure induction for filipino |
publisher |
Animo Repository |
publishDate |
2008 |
url |
https://animorepository.dlsu.edu.ph/etd_masteral/3701 https://animorepository.dlsu.edu.ph/context/etd_masteral/article/10539/viewcontent/CDTG004414_P.pdf |
_version_ |
1781418199962615808 |