An experimental Tagalog Finite State Automata spellchecker with Levenshtein edit-distance feature

© 2019 IEEE. In this paper, we present an experimental development of a spell checker for the Tagalog language using a set of word list with 300 random root words and three inflected forms as training data and a two-layered architecture of combined Deterministic Finite Automaton (DFA) with Levenshte...

Full description

Saved in:
Bibliographic Details
Main Authors: Imperial, Joseph Marvin R., Ya-On, Czeritonnie Gail V., Ureta, Jennifer C.
Format: text
Published: Animo Repository 2019
Online Access:https://animorepository.dlsu.edu.ph/faculty_research/871
https://animorepository.dlsu.edu.ph/context/faculty_research/article/1870/type/native/viewcontent
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
id oai:animorepository.dlsu.edu.ph:faculty_research-1870
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:faculty_research-18702022-07-21T00:38:45Z An experimental Tagalog Finite State Automata spellchecker with Levenshtein edit-distance feature Imperial, Joseph Marvin R. Ya-On, Czeritonnie Gail V. Ureta, Jennifer C. © 2019 IEEE. In this paper, we present an experimental development of a spell checker for the Tagalog language using a set of word list with 300 random root words and three inflected forms as training data and a two-layered architecture of combined Deterministic Finite Automaton (DFA) with Levenshtein edit-distance. A DFA is used to process strings to identify if it belongs to a certain language via the binary result of accept or reject. The Levenshtein edit-distance of two strings is the number (k) of deletions, alterations, insertions between two sequences of characters. From the sample trained wordlist, results show that a value of 1 for the edit-distance (k) can be effective in spelling Tagalog sentences. Any value greater than 1 can cause suggestion of words even if the spelling of words is correct due to selective and prominent usage of certain characters in the Tagalog language like a, n, g, t, s, l. 2019-11-01T07:00:00Z text text/html https://animorepository.dlsu.edu.ph/faculty_research/871 https://animorepository.dlsu.edu.ph/context/faculty_research/article/1870/type/native/viewcontent Faculty Research Work Animo Repository
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
description © 2019 IEEE. In this paper, we present an experimental development of a spell checker for the Tagalog language using a set of word list with 300 random root words and three inflected forms as training data and a two-layered architecture of combined Deterministic Finite Automaton (DFA) with Levenshtein edit-distance. A DFA is used to process strings to identify if it belongs to a certain language via the binary result of accept or reject. The Levenshtein edit-distance of two strings is the number (k) of deletions, alterations, insertions between two sequences of characters. From the sample trained wordlist, results show that a value of 1 for the edit-distance (k) can be effective in spelling Tagalog sentences. Any value greater than 1 can cause suggestion of words even if the spelling of words is correct due to selective and prominent usage of certain characters in the Tagalog language like a, n, g, t, s, l.
format text
author Imperial, Joseph Marvin R.
Ya-On, Czeritonnie Gail V.
Ureta, Jennifer C.
spellingShingle Imperial, Joseph Marvin R.
Ya-On, Czeritonnie Gail V.
Ureta, Jennifer C.
An experimental Tagalog Finite State Automata spellchecker with Levenshtein edit-distance feature
author_facet Imperial, Joseph Marvin R.
Ya-On, Czeritonnie Gail V.
Ureta, Jennifer C.
author_sort Imperial, Joseph Marvin R.
title An experimental Tagalog Finite State Automata spellchecker with Levenshtein edit-distance feature
title_short An experimental Tagalog Finite State Automata spellchecker with Levenshtein edit-distance feature
title_full An experimental Tagalog Finite State Automata spellchecker with Levenshtein edit-distance feature
title_fullStr An experimental Tagalog Finite State Automata spellchecker with Levenshtein edit-distance feature
title_full_unstemmed An experimental Tagalog Finite State Automata spellchecker with Levenshtein edit-distance feature
title_sort experimental tagalog finite state automata spellchecker with levenshtein edit-distance feature
publisher Animo Repository
publishDate 2019
url https://animorepository.dlsu.edu.ph/faculty_research/871
https://animorepository.dlsu.edu.ph/context/faculty_research/article/1870/type/native/viewcontent
_version_ 1740844621727006720