A STATISTICAL AND RULE-BASED GRAMMAR CHECKER FOR INDONESIAN TEXT

Spelling and grammar errors in Indonesian text are not an uncommon occurrence, even in formal contexts such as academic or bureaucratic documents. Meanwhile, the use of proper language is essential for expressing ideas and thoughts clearly in written text. Spelling and grammar checkers are widely-us...

Full description

Saved in:

Bibliographic Details
Main Author:	FAHDA (NIM : 13513079), ASANILTA
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/21279
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:21279
spelling	id-itb.:212792017-10-09T10:28:07ZA STATISTICAL AND RULE-BASED GRAMMAR CHECKER FOR INDONESIAN TEXT FAHDA (NIM : 13513079), ASANILTA Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/21279 Spelling and grammar errors in Indonesian text are not an uncommon occurrence, even in formal contexts such as academic or bureaucratic documents. Meanwhile, the use of proper language is essential for expressing ideas and thoughts clearly in written text. Spelling and grammar checkers are widely-used tools which aim to help in detecting and correcting various writing errors. However, there are currently no proofreading systems capable of checking both spelling and grammar errors in Indonesian text. Therefore, this study proposes an Indonesian spelling and grammar checker prototype which uses a combination of rules and statistical methods. <br /> <br /> <br /> There are currently 38 rules from regular expressions which detect, correct, and explain common errors in punctuation, word choice, and spelling. The spelling checker then examines every word using a dictionary trie to find misspellings and Damerau-Levenshtein distance neighbors as correction candidates, as well as morphological analysis for processing certain word forms. A bigram or co-occurrence-based Hidden Markov Model is used for ranking and selecting the candidates. The grammar checker uses a trigram language model from tokens, POS tags, or phrase chunks for identifying sentences with incorrect structures according to a threshold value chosen empirically. <br /> <br /> <br /> By experiment, the co-occurrence HMM with an emission probability weight coefficient of 0.95 and transition probability weight coefficient of 0.05 is selected as the most suitable model for the spelling checker. As for the grammar checker, the phrase chunk model which normalizes by chunk length and uses a threshold score of -0.4 gave the best results. The parameter values achieving the best results are applied in the final system. The document evaluation of this system showed an overall accuracy of 83.18% and the prototype is implemented as a web application. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Spelling and grammar errors in Indonesian text are not an uncommon occurrence, even in formal contexts such as academic or bureaucratic documents. Meanwhile, the use of proper language is essential for expressing ideas and thoughts clearly in written text. Spelling and grammar checkers are widely-used tools which aim to help in detecting and correcting various writing errors. However, there are currently no proofreading systems capable of checking both spelling and grammar errors in Indonesian text. Therefore, this study proposes an Indonesian spelling and grammar checker prototype which uses a combination of rules and statistical methods. <br /> <br /> <br /> There are currently 38 rules from regular expressions which detect, correct, and explain common errors in punctuation, word choice, and spelling. The spelling checker then examines every word using a dictionary trie to find misspellings and Damerau-Levenshtein distance neighbors as correction candidates, as well as morphological analysis for processing certain word forms. A bigram or co-occurrence-based Hidden Markov Model is used for ranking and selecting the candidates. The grammar checker uses a trigram language model from tokens, POS tags, or phrase chunks for identifying sentences with incorrect structures according to a threshold value chosen empirically. <br /> <br /> <br /> By experiment, the co-occurrence HMM with an emission probability weight coefficient of 0.95 and transition probability weight coefficient of 0.05 is selected as the most suitable model for the spelling checker. As for the grammar checker, the phrase chunk model which normalizes by chunk length and uses a threshold score of -0.4 gave the best results. The parameter values achieving the best results are applied in the final system. The document evaluation of this system showed an overall accuracy of 83.18% and the prototype is implemented as a web application.
format	Final Project
author	FAHDA (NIM : 13513079), ASANILTA
spellingShingle	FAHDA (NIM : 13513079), ASANILTA A STATISTICAL AND RULE-BASED GRAMMAR CHECKER FOR INDONESIAN TEXT
author_facet	FAHDA (NIM : 13513079), ASANILTA
author_sort	FAHDA (NIM : 13513079), ASANILTA
title	A STATISTICAL AND RULE-BASED GRAMMAR CHECKER FOR INDONESIAN TEXT
title_short	A STATISTICAL AND RULE-BASED GRAMMAR CHECKER FOR INDONESIAN TEXT
title_full	A STATISTICAL AND RULE-BASED GRAMMAR CHECKER FOR INDONESIAN TEXT
title_fullStr	A STATISTICAL AND RULE-BASED GRAMMAR CHECKER FOR INDONESIAN TEXT
title_full_unstemmed	A STATISTICAL AND RULE-BASED GRAMMAR CHECKER FOR INDONESIAN TEXT
title_sort	statistical and rule-based grammar checker for indonesian text
url	https://digilib.itb.ac.id/gdl/view/21279
_version_	1822019454515544064

A STATISTICAL AND RULE-BASED GRAMMAR CHECKER FOR INDONESIAN TEXT

Similar Items