A hybrid approach to extracting the 5Ws in Filipino news articles
The goal of this research is to develop an information extraction system for Filipino news articles that extracts the 5Ws, namely, sino (who), ano (what), kailan (when), saan (where), and bakit (why) and produces an output which can reduce the e ort required for further data analysis. Utilizing the...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2016
|
Subjects: | |
Online Access: | https://animorepository.dlsu.edu.ph/etd_bachelors/11501 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
id |
oai:animorepository.dlsu.edu.ph:etd_bachelors-12146 |
---|---|
record_format |
eprints |
spelling |
oai:animorepository.dlsu.edu.ph:etd_bachelors-121462022-12-17T03:36:34Z A hybrid approach to extracting the 5Ws in Filipino news articles Chua, Jedrick L. Livelo, Evan Dennison S. Ver, Andrea Nicole O. Yao, John Paul S. The goal of this research is to develop an information extraction system for Filipino news articles that extracts the 5Ws, namely, sino (who), ano (what), kailan (when), saan (where), and bakit (why) and produces an output which can reduce the e ort required for further data analysis. Utilizing the output of the information extraction system, an interface is provided to allow its users to view, search, and edit the extracted data in a structured format. The information extraction system applies both rule-based and machine learning techniques as well as various tools in order to perform text processing, candidate selection, and feature extraction. The functions that fall under text processing include tokenization, sentence segmentation, named-entity recogni- tion, part-of-speech tagging, and word scoring. Afterwards, rule-based candidate selection is performed by utilizing both the output of the text processing module as well as text markers. Subsequently, feature extraction is done through both machine-learned candidate classi cation models for the who, when, and where features and rule-based algorithms for the what and why features. Furthermore, the information extraction system was evaluated alongside the system in the research of Cagampan (2014) in order to compare the results against a similar system that extracts the same features. However, the system in Cagampans research is optimized for Filipino editorials as opposed to news articles. The proponents' system was able to achieve 63.3257% accuracy for 'who', 71.3768% accuracy for 'when', 58.2492% accuracy for 'where', 89.2% accuracy for 'what', and 50% accuracy for 'why'. In comparison to Cagampan's system, the 'who', 'where', and 'what' feature extraction modules of the proponents' system performed better. 2016-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/etd_bachelors/11501 Bachelor's Theses English Animo Repository Text processing (Computer science) Natural language processing (Computer science) Information retrieval Computer Sciences |
institution |
De La Salle University |
building |
De La Salle University Library |
continent |
Asia |
country |
Philippines Philippines |
content_provider |
De La Salle University Library |
collection |
DLSU Institutional Repository |
language |
English |
topic |
Text processing (Computer science) Natural language processing (Computer science) Information retrieval Computer Sciences |
spellingShingle |
Text processing (Computer science) Natural language processing (Computer science) Information retrieval Computer Sciences Chua, Jedrick L. Livelo, Evan Dennison S. Ver, Andrea Nicole O. Yao, John Paul S. A hybrid approach to extracting the 5Ws in Filipino news articles |
description |
The goal of this research is to develop an information extraction system for Filipino news articles that extracts the 5Ws, namely, sino (who), ano (what), kailan (when), saan (where), and bakit (why) and produces an output which can reduce the e ort required for further data analysis. Utilizing the output of the information extraction system, an interface is provided to allow its users to view, search, and edit the extracted data in a structured format.
The information extraction system applies both rule-based and machine learning techniques as well as various tools in order to perform text processing, candidate selection, and feature extraction. The functions that fall under text processing include tokenization, sentence segmentation, named-entity recogni- tion, part-of-speech tagging, and word scoring. Afterwards, rule-based candidate selection is performed by utilizing both the output of the text processing module as well as text markers. Subsequently, feature extraction is done through both machine-learned candidate classi cation models for the who, when, and where features and rule-based algorithms for the what and why features.
Furthermore, the information extraction system was evaluated alongside the system in the research of Cagampan (2014) in order to compare the results against a similar system that extracts the same features. However, the system in Cagampans research is optimized for Filipino editorials as opposed to news articles.
The proponents' system was able to achieve 63.3257% accuracy for 'who', 71.3768% accuracy for 'when', 58.2492% accuracy for 'where', 89.2% accuracy for 'what', and 50% accuracy for 'why'. In comparison to Cagampan's system, the 'who', 'where', and 'what' feature extraction modules of the proponents' system performed better. |
format |
text |
author |
Chua, Jedrick L. Livelo, Evan Dennison S. Ver, Andrea Nicole O. Yao, John Paul S. |
author_facet |
Chua, Jedrick L. Livelo, Evan Dennison S. Ver, Andrea Nicole O. Yao, John Paul S. |
author_sort |
Chua, Jedrick L. |
title |
A hybrid approach to extracting the 5Ws in Filipino news articles |
title_short |
A hybrid approach to extracting the 5Ws in Filipino news articles |
title_full |
A hybrid approach to extracting the 5Ws in Filipino news articles |
title_fullStr |
A hybrid approach to extracting the 5Ws in Filipino news articles |
title_full_unstemmed |
A hybrid approach to extracting the 5Ws in Filipino news articles |
title_sort |
hybrid approach to extracting the 5ws in filipino news articles |
publisher |
Animo Repository |
publishDate |
2016 |
url |
https://animorepository.dlsu.edu.ph/etd_bachelors/11501 |
_version_ |
1753806416882171904 |