A hybrid agent for automatically determining and extracting the 5Ws of Filipino news articles

Copying permitted for private and academic purposes. As the number of sources of unstructured data continues to grow exponentially, manually reading through all this data becomes notoriously time consuming. Thus, there is a need for faster understanding and processing of this data. This can be achie...

Full description

Saved in:
Bibliographic Details
Main Authors: Livelo, Evan Dennison S., Ver, Andrea Nicole O., Chua, Jedrick L., Yao, John Paul S., Cheng, Charibeth K.
Format: text
Published: Animo Repository 2017
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/faculty_research/3314
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Description
Summary:Copying permitted for private and academic purposes. As the number of sources of unstructured data continues to grow exponentially, manually reading through all this data becomes notoriously time consuming. Thus, there is a need for faster understanding and processing of this data. This can be achieved by automating the task through the use of information extraction. In this paper, we present an agent that automatically detects and extracts the 5Ws, namely the who, when, where, what, and why from Filipino news articles using a hybrid of machine learning and linguistic rules. The agent caters specifically to the Filipino language by working with its unique features such as ambiguous prepositions and markers, focus instead of subject and predicate, dialect influences, and others. In order to be able to maximize machine learning algorithms, techniques such as linguistic tagging and weighted decision trees are used to preprocess and filter the data as well as refine the final results. The results show that the agent achieved an accuracy of 63.33% for who, 71.38% for when, 58.25% for where, 89.20% for what, and 50.00% for why. Copyright © by the paper's authors.