A corpus based-Filipino grammar checker using hybrid N-gram rules from grammatically-correct terms

This study examines the use of a corpus-based approach as a method for detecting grammatical errors and suggesting corrections for the Filipino language. Prior to this study, the said approach has not yet been applied for the target language, while it showed a high potential in error detection and c...

Full description

Saved in:
Bibliographic Details
Main Author: Go, Matthew Phillip
Format: text
Language:English
Published: Animo Repository 2016
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_masteral/5335
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:This study examines the use of a corpus-based approach as a method for detecting grammatical errors and suggesting corrections for the Filipino language. Prior to this study, the said approach has not yet been applied for the target language, while it showed a high potential in error detection and correction in other languages. Currently, Filipino grammar checker systems are limited and are mostly rule-based systems. One huge concern with this existing type of systems in Filipino is that it can only detect errors that were denied by the system which results to a very limited set of error types. The proposed approach, being corpus-based, learns grammar rules from a grammatically-correct and tagged corpus which will be used in detecting errors and providing suggestions. The grammar rules, which are hybrid n-grams, will be composed of words, part-of-speech tags, and lemmas. Input sentences will be compared against these grammar rules and identify whether there is an error or not using a weighted Levenshtein edit distance algorithm. Using this approach, the correction types can be suggested: insertion, deletion, substitution, merging, and unmerging. The approach also covers a broad range of error types such as: incorrect a x, misspellings, wrong word usage, missing word, unnecessary words, incorrectly merged words, and incorrectly unmerged words. The developed system has scored 64.11% in producing correct suggestions for 248 test phrases containing spelling/grammar errors and scored 70.95% accuracy in aging error-free words in a 1,284 error-free word corpus using only a small training corpus of 7,384 complex sentences.