Exploring hybrid linguistic feature sets to measure filipino text readability

The proper identification of difficulty levels of reading materials prescribed in an educational setting is key towards effective learning and comprehension. Educators and publishers have relied on readability formulas in predicting text readability. While the English language boasts a rich history...

Full description

Saved in:
Bibliographic Details
Main Author: Imperial, Joseph Marvin R.
Format: text
Language:English
Published: Animo Repository 2021
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etdm_comsci/5
https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1000&context=etdm_comsci
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etdm_comsci-1000
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etdm_comsci-10002021-09-08T04:25:40Z Exploring hybrid linguistic feature sets to measure filipino text readability Imperial, Joseph Marvin R. The proper identification of difficulty levels of reading materials prescribed in an educational setting is key towards effective learning and comprehension. Educators and publishers have relied on readability formulas in predicting text readability. While the English language boasts a rich history of research efforts in readability assessment, limited work has been done for the Filipino language. This study explores the use of an extensive range of linguistic predictors identified by experts spanning traditional, lexical, language model, syllable pattern, and morphological features to train an automatic readability assessment model using Logistic Regression, Support Vector Machines, and Random Forest. Over 265 story books and passages from Adarna House Inc. and DepEd Commons covering Grades 1, 2, and 3 were used for training the models. Results of feature selection process show that the optimal subset of linguistic feature sets achieving the highest performance of 66.1\% accuracy is a hybrid Random Forest model using the combination of traditional (TRAD) and syllable pattern (SYLL) features. Performing global and local model interpretation showed that surface-based features such as word count, average sentence length, and sentence count used in old readability formulas remain relevant in measuring the readability of Filipino texts, but combining them with deeper linguistic features would yield better performance of models. Future directions of the study include the use of various types of written literature, not only story books, to develop a more generalized readability assessment model as well as the use of deep neural networks for automatic feature extraction. Keywords: Readability Assessment, Filipino, Linguistic Features, Story Books 2021-04-05T07:00:00Z text application/pdf https://animorepository.dlsu.edu.ph/etdm_comsci/5 https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1000&context=etdm_comsci Computer Science Master's Theses English Animo Repository Readability (Literary style) Evaluation Filipino language Neural networks (Computer science) Children's books Computer Sciences
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Readability (Literary style)
Evaluation
Filipino language
Neural networks (Computer science)
Children's books
Computer Sciences
spellingShingle Readability (Literary style)
Evaluation
Filipino language
Neural networks (Computer science)
Children's books
Computer Sciences
Imperial, Joseph Marvin R.
Exploring hybrid linguistic feature sets to measure filipino text readability
description The proper identification of difficulty levels of reading materials prescribed in an educational setting is key towards effective learning and comprehension. Educators and publishers have relied on readability formulas in predicting text readability. While the English language boasts a rich history of research efforts in readability assessment, limited work has been done for the Filipino language. This study explores the use of an extensive range of linguistic predictors identified by experts spanning traditional, lexical, language model, syllable pattern, and morphological features to train an automatic readability assessment model using Logistic Regression, Support Vector Machines, and Random Forest. Over 265 story books and passages from Adarna House Inc. and DepEd Commons covering Grades 1, 2, and 3 were used for training the models. Results of feature selection process show that the optimal subset of linguistic feature sets achieving the highest performance of 66.1\% accuracy is a hybrid Random Forest model using the combination of traditional (TRAD) and syllable pattern (SYLL) features. Performing global and local model interpretation showed that surface-based features such as word count, average sentence length, and sentence count used in old readability formulas remain relevant in measuring the readability of Filipino texts, but combining them with deeper linguistic features would yield better performance of models. Future directions of the study include the use of various types of written literature, not only story books, to develop a more generalized readability assessment model as well as the use of deep neural networks for automatic feature extraction. Keywords: Readability Assessment, Filipino, Linguistic Features, Story Books
format text
author Imperial, Joseph Marvin R.
author_facet Imperial, Joseph Marvin R.
author_sort Imperial, Joseph Marvin R.
title Exploring hybrid linguistic feature sets to measure filipino text readability
title_short Exploring hybrid linguistic feature sets to measure filipino text readability
title_full Exploring hybrid linguistic feature sets to measure filipino text readability
title_fullStr Exploring hybrid linguistic feature sets to measure filipino text readability
title_full_unstemmed Exploring hybrid linguistic feature sets to measure filipino text readability
title_sort exploring hybrid linguistic feature sets to measure filipino text readability
publisher Animo Repository
publishDate 2021
url https://animorepository.dlsu.edu.ph/etdm_comsci/5
https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1000&context=etdm_comsci
_version_ 1710755611168210944