A hybrid approach to extracting the 5Ws in Filipino news articles

The goal of this research is to develop an information extraction system for Filipino news articles that extracts the 5Ws, namely, sino (who), ano (what), kailan (when), saan (where), and bakit (why) and produces an output which can reduce the e ort required for further data analysis. Utilizing the...

Full description

Saved in:
Bibliographic Details
Main Authors: Chua, Jedrick L., Livelo, Evan Dennison S., Ver, Andrea Nicole O., Yao, John Paul S.
Format: text
Language:English
Published: Animo Repository 2016
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etd_bachelors/11501
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
id oai:animorepository.dlsu.edu.ph:etd_bachelors-12146
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:etd_bachelors-121462022-12-17T03:36:34Z A hybrid approach to extracting the 5Ws in Filipino news articles Chua, Jedrick L. Livelo, Evan Dennison S. Ver, Andrea Nicole O. Yao, John Paul S. The goal of this research is to develop an information extraction system for Filipino news articles that extracts the 5Ws, namely, sino (who), ano (what), kailan (when), saan (where), and bakit (why) and produces an output which can reduce the e ort required for further data analysis. Utilizing the output of the information extraction system, an interface is provided to allow its users to view, search, and edit the extracted data in a structured format. The information extraction system applies both rule-based and machine learning techniques as well as various tools in order to perform text processing, candidate selection, and feature extraction. The functions that fall under text processing include tokenization, sentence segmentation, named-entity recogni- tion, part-of-speech tagging, and word scoring. Afterwards, rule-based candidate selection is performed by utilizing both the output of the text processing module as well as text markers. Subsequently, feature extraction is done through both machine-learned candidate classi cation models for the who, when, and where features and rule-based algorithms for the what and why features. Furthermore, the information extraction system was evaluated alongside the system in the research of Cagampan (2014) in order to compare the results against a similar system that extracts the same features. However, the system in Cagampans research is optimized for Filipino editorials as opposed to news articles. The proponents' system was able to achieve 63.3257% accuracy for 'who', 71.3768% accuracy for 'when', 58.2492% accuracy for 'where', 89.2% accuracy for 'what', and 50% accuracy for 'why'. In comparison to Cagampan's system, the 'who', 'where', and 'what' feature extraction modules of the proponents' system performed better. 2016-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_bachelors/11501 Bachelor's Theses English Animo Repository Text processing (Computer science) Natural language processing (Computer science) Information retrieval Computer Sciences
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
language English
topic Text processing (Computer science)
Natural language processing (Computer science)
Information retrieval
Computer Sciences
spellingShingle Text processing (Computer science)
Natural language processing (Computer science)
Information retrieval
Computer Sciences
Chua, Jedrick L.
Livelo, Evan Dennison S.
Ver, Andrea Nicole O.
Yao, John Paul S.
A hybrid approach to extracting the 5Ws in Filipino news articles
description The goal of this research is to develop an information extraction system for Filipino news articles that extracts the 5Ws, namely, sino (who), ano (what), kailan (when), saan (where), and bakit (why) and produces an output which can reduce the e ort required for further data analysis. Utilizing the output of the information extraction system, an interface is provided to allow its users to view, search, and edit the extracted data in a structured format. The information extraction system applies both rule-based and machine learning techniques as well as various tools in order to perform text processing, candidate selection, and feature extraction. The functions that fall under text processing include tokenization, sentence segmentation, named-entity recogni- tion, part-of-speech tagging, and word scoring. Afterwards, rule-based candidate selection is performed by utilizing both the output of the text processing module as well as text markers. Subsequently, feature extraction is done through both machine-learned candidate classi cation models for the who, when, and where features and rule-based algorithms for the what and why features. Furthermore, the information extraction system was evaluated alongside the system in the research of Cagampan (2014) in order to compare the results against a similar system that extracts the same features. However, the system in Cagampans research is optimized for Filipino editorials as opposed to news articles. The proponents' system was able to achieve 63.3257% accuracy for 'who', 71.3768% accuracy for 'when', 58.2492% accuracy for 'where', 89.2% accuracy for 'what', and 50% accuracy for 'why'. In comparison to Cagampan's system, the 'who', 'where', and 'what' feature extraction modules of the proponents' system performed better.
format text
author Chua, Jedrick L.
Livelo, Evan Dennison S.
Ver, Andrea Nicole O.
Yao, John Paul S.
author_facet Chua, Jedrick L.
Livelo, Evan Dennison S.
Ver, Andrea Nicole O.
Yao, John Paul S.
author_sort Chua, Jedrick L.
title A hybrid approach to extracting the 5Ws in Filipino news articles
title_short A hybrid approach to extracting the 5Ws in Filipino news articles
title_full A hybrid approach to extracting the 5Ws in Filipino news articles
title_fullStr A hybrid approach to extracting the 5Ws in Filipino news articles
title_full_unstemmed A hybrid approach to extracting the 5Ws in Filipino news articles
title_sort hybrid approach to extracting the 5ws in filipino news articles
publisher Animo Repository
publishDate 2016
url https://animorepository.dlsu.edu.ph/etd_bachelors/11501
_version_ 1753806416882171904