A hybrid agent for automatically determining and extracting the 5Ws of Filipino news articles

Copying permitted for private and academic purposes. As the number of sources of unstructured data continues to grow exponentially, manually reading through all this data becomes notoriously time consuming. Thus, there is a need for faster understanding and processing of this data. This can be achie...

Full description

Saved in:
Bibliographic Details
Main Authors: Livelo, Evan Dennison S., Ver, Andrea Nicole O., Chua, Jedrick L., Yao, John Paul S., Cheng, Charibeth K.
Format: text
Published: Animo Repository 2017
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/faculty_research/3314
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
id oai:animorepository.dlsu.edu.ph:faculty_research-4297
record_format eprints
spelling oai:animorepository.dlsu.edu.ph:faculty_research-42972021-04-21T00:16:03Z A hybrid agent for automatically determining and extracting the 5Ws of Filipino news articles Livelo, Evan Dennison S. Ver, Andrea Nicole O. Chua, Jedrick L. Yao, John Paul S. Cheng, Charibeth K. Copying permitted for private and academic purposes. As the number of sources of unstructured data continues to grow exponentially, manually reading through all this data becomes notoriously time consuming. Thus, there is a need for faster understanding and processing of this data. This can be achieved by automating the task through the use of information extraction. In this paper, we present an agent that automatically detects and extracts the 5Ws, namely the who, when, where, what, and why from Filipino news articles using a hybrid of machine learning and linguistic rules. The agent caters specifically to the Filipino language by working with its unique features such as ambiguous prepositions and markers, focus instead of subject and predicate, dialect influences, and others. In order to be able to maximize machine learning algorithms, techniques such as linguistic tagging and weighted decision trees are used to preprocess and filter the data as well as refine the final results. The results show that the agent achieved an accuracy of 63.33% for who, 71.38% for when, 58.25% for where, 89.20% for what, and 50.00% for why. Copyright © by the paper's authors. 2017-01-01T08:00:00Z text https://animorepository.dlsu.edu.ph/faculty_research/3314 Faculty Research Work Animo Repository Text data mining Computer Sciences Software Engineering
institution De La Salle University
building De La Salle University Library
continent Asia
country Philippines
Philippines
content_provider De La Salle University Library
collection DLSU Institutional Repository
topic Text data mining
Computer Sciences
Software Engineering
spellingShingle Text data mining
Computer Sciences
Software Engineering
Livelo, Evan Dennison S.
Ver, Andrea Nicole O.
Chua, Jedrick L.
Yao, John Paul S.
Cheng, Charibeth K.
A hybrid agent for automatically determining and extracting the 5Ws of Filipino news articles
description Copying permitted for private and academic purposes. As the number of sources of unstructured data continues to grow exponentially, manually reading through all this data becomes notoriously time consuming. Thus, there is a need for faster understanding and processing of this data. This can be achieved by automating the task through the use of information extraction. In this paper, we present an agent that automatically detects and extracts the 5Ws, namely the who, when, where, what, and why from Filipino news articles using a hybrid of machine learning and linguistic rules. The agent caters specifically to the Filipino language by working with its unique features such as ambiguous prepositions and markers, focus instead of subject and predicate, dialect influences, and others. In order to be able to maximize machine learning algorithms, techniques such as linguistic tagging and weighted decision trees are used to preprocess and filter the data as well as refine the final results. The results show that the agent achieved an accuracy of 63.33% for who, 71.38% for when, 58.25% for where, 89.20% for what, and 50.00% for why. Copyright © by the paper's authors.
format text
author Livelo, Evan Dennison S.
Ver, Andrea Nicole O.
Chua, Jedrick L.
Yao, John Paul S.
Cheng, Charibeth K.
author_facet Livelo, Evan Dennison S.
Ver, Andrea Nicole O.
Chua, Jedrick L.
Yao, John Paul S.
Cheng, Charibeth K.
author_sort Livelo, Evan Dennison S.
title A hybrid agent for automatically determining and extracting the 5Ws of Filipino news articles
title_short A hybrid agent for automatically determining and extracting the 5Ws of Filipino news articles
title_full A hybrid agent for automatically determining and extracting the 5Ws of Filipino news articles
title_fullStr A hybrid agent for automatically determining and extracting the 5Ws of Filipino news articles
title_full_unstemmed A hybrid agent for automatically determining and extracting the 5Ws of Filipino news articles
title_sort hybrid agent for automatically determining and extracting the 5ws of filipino news articles
publisher Animo Repository
publishDate 2017
url https://animorepository.dlsu.edu.ph/faculty_research/3314
_version_ 1767195874081898496