Document classification of Filipino online scam incident text using data mining techniques

The increasing number of online transactions and other internet activities give rise to the proliferation of online scam. The Philippine National Police - Anti Cybercrime Group (PNP-ACG) reported an increasing number of complaints from a double digit figure in 2013 to a triple digit figure in 2017....

Full description

Saved in:
Bibliographic Details
Main Authors: Palad, Eddie Bouy B., Tangkeko, Marivic S., Magpantay, Lissa Andrea K., Sipin, Glenn L.
Format: text
Published: Animo Repository 2019
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/faculty_research/2696
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Description
Summary:The increasing number of online transactions and other internet activities give rise to the proliferation of online scam. The Philippine National Police - Anti Cybercrime Group (PNP-ACG) reported an increasing number of complaints from a double digit figure in 2013 to a triple digit figure in 2017. The challenge of addressing this problem in the Philippines is shared by other developing countries in Southeast Asia and other parts of the world. Since 2013 when the PNP-ACG was established, cybercrime data continue to be accumulated but were not given much attention and significance in research. Previous studies highlight the importance of taking advantage of data mining. However, the absence of empirical studies on cybercrime analytics in the country connotes the lack of exploitation of data mining in facilitating cybercrime investigations. This study exploits Weka text mining tool in order to draw insights by classifying a given online scam dataset. Online scam unstructured data were considered as dataset containing a total of 14, 098 mainly Filipino words. J48 Decision Tree, Naïve Bayes, and Sequential Minimal Optimization were used to build classification models. All these three classifiers or algorithms were compared in terms of performance and prediction accuracy. The results show that J48 achieves the highest accuracy and the lowest error rate, followed by Naïve Bayes and then the SMO classifier. Also, the responses during validation reveal that J48 is preferred over the other classifiers as it easy to understand and apply in cybercrime investigations. This demonstrates how text mining can assist the PNP-ACG in analyzing online scam criminal data as it also highlights the importance of employing data mining tools in the legal and criminal investigation domains in the Philippines. Further work can be carried out in the future using different and a more inclusive cybercrime datasets and other classification techniques in Weka or any other data mining tool, and other data mining tasks such as crime prevention and prediction, clustering, finding leads, trends and patterns of criminal activities, among others. © 2019 IEEE.