A corpus based-Filipino grammar checker using hybrid N-gram rules from grammatically-correct terms

This study examines the use of a corpus-based approach as a method for detecting grammatical errors and suggesting corrections for the Filipino language. Prior to this study, the said approach has not yet been applied for the target language, while it showed a high potential in error detection and c...

Full description

Saved in:
Bibliographic Details
Main Author: Go, Matthew Phillip
Format: text
Language:English
Published: Animo Repository 2016
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/5335
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_masteral-12173
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_masteral-121732024-10-30T05:46:13Z A corpus based-Filipino grammar checker using hybrid N-gram rules from grammatically-correct terms Go, Matthew Phillip This study examines the use of a corpus-based approach as a method for detecting grammatical errors and suggesting corrections for the Filipino language. Prior to this study, the said approach has not yet been applied for the target language, while it showed a high potential in error detection and correction in other languages. Currently, Filipino grammar checker systems are limited and are mostly rule-based systems. One huge concern with this existing type of systems in Filipino is that it can only detect errors that were denied by the system which results to a very limited set of error types. The proposed approach, being corpus-based, learns grammar rules from a grammatically-correct and tagged corpus which will be used in detecting errors and providing suggestions. The grammar rules, which are hybrid n-grams, will be composed of words, part-of-speech tags, and lemmas. Input sentences will be compared against these grammar rules and identify whether there is an error or not using a weighted Levenshtein edit distance algorithm. Using this approach, the correction types can be suggested: insertion, deletion, substitution, merging, and unmerging. The approach also covers a broad range of error types such as: incorrect a x, misspellings, wrong word usage, missing word, unnecessary words, incorrectly merged words, and incorrectly unmerged words. The developed system has scored 64.11% in producing correct suggestions for 248 test phrases containing spelling/grammar errors and scored 70.95% accuracy in aging error-free words in a 1,284 error-free word corpus using only a small training corpus of 7,384 complex sentences. 2016-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_masteral/5335 Master's Theses English Animo Repository Filipino language--Grammar Filipino language Filipino language--Study and teaching
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Filipino language--Grammar
Filipino language
Filipino language--Study and teaching
spellingShingle Filipino language--Grammar
Filipino language
Filipino language--Study and teaching
Go, Matthew Phillip
A corpus based-Filipino grammar checker using hybrid N-gram rules from grammatically-correct terms
description This study examines the use of a corpus-based approach as a method for detecting grammatical errors and suggesting corrections for the Filipino language. Prior to this study, the said approach has not yet been applied for the target language, while it showed a high potential in error detection and correction in other languages. Currently, Filipino grammar checker systems are limited and are mostly rule-based systems. One huge concern with this existing type of systems in Filipino is that it can only detect errors that were denied by the system which results to a very limited set of error types. The proposed approach, being corpus-based, learns grammar rules from a grammatically-correct and tagged corpus which will be used in detecting errors and providing suggestions. The grammar rules, which are hybrid n-grams, will be composed of words, part-of-speech tags, and lemmas. Input sentences will be compared against these grammar rules and identify whether there is an error or not using a weighted Levenshtein edit distance algorithm. Using this approach, the correction types can be suggested: insertion, deletion, substitution, merging, and unmerging. The approach also covers a broad range of error types such as: incorrect a x, misspellings, wrong word usage, missing word, unnecessary words, incorrectly merged words, and incorrectly unmerged words. The developed system has scored 64.11% in producing correct suggestions for 248 test phrases containing spelling/grammar errors and scored 70.95% accuracy in aging error-free words in a 1,284 error-free word corpus using only a small training corpus of 7,384 complex sentences.
format text
author Go, Matthew Phillip
author_facet Go, Matthew Phillip
author_sort Go, Matthew Phillip
title A corpus based-Filipino grammar checker using hybrid N-gram rules from grammatically-correct terms
title_short A corpus based-Filipino grammar checker using hybrid N-gram rules from grammatically-correct terms
title_full A corpus based-Filipino grammar checker using hybrid N-gram rules from grammatically-correct terms
title_fullStr A corpus based-Filipino grammar checker using hybrid N-gram rules from grammatically-correct terms
title_full_unstemmed A corpus based-Filipino grammar checker using hybrid N-gram rules from grammatically-correct terms
title_sort corpus based-filipino grammar checker using hybrid n-gram rules from grammatically-correct terms
publisher Animo Repository
publishDate 2016
url https://animorepository.dlsu.edu.ph/etd_masteral/5335
_version_ 1814781380387667968