A hybrid agent for automatically determining and extracting the 5Ws of Filipino news articles
Copying permitted for private and academic purposes. As the number of sources of unstructured data continues to grow exponentially, manually reading through all this data becomes notoriously time consuming. Thus, there is a need for faster understanding and processing of this data. This can be achie...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | text |
Published: |
Animo Repository
2017
|
Subjects: | |
Online Access: | https://animorepository.dlsu.edu.ph/faculty_research/3314 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Summary: | Copying permitted for private and academic purposes. As the number of sources of unstructured data continues to grow exponentially, manually reading through all this data becomes notoriously time consuming. Thus, there is a need for faster understanding and processing of this data. This can be achieved by automating the task through the use of information extraction. In this paper, we present an agent that automatically detects and extracts the 5Ws, namely the who, when, where, what, and why from Filipino news articles using a hybrid of machine learning and linguistic rules. The agent caters specifically to the Filipino language by working with its unique features such as ambiguous prepositions and markers, focus instead of subject and predicate, dialect influences, and others. In order to be able to maximize machine learning algorithms, techniques such as linguistic tagging and weighted decision trees are used to preprocess and filter the data as well as refine the final results. The results show that the agent achieved an accuracy of 63.33% for who, 71.38% for when, 58.25% for where, 89.20% for what, and 50.00% for why. Copyright © by the paper's authors. |
---|