An enhanced malay named entity recognition using clustering and classification approach for crime textual data analysis

Named Entity Recognition (NER) is one of the tasks undertaken in the information extraction. NER is used for extracting and classifying words or entities that belong to the proper noun category in text data such as the person's name, location, organization, date, etc. As seen in today's ge...

Full description

Saved in:
Bibliographic Details
Main Author: Salleh, Muhammad Sharilazlan
Format: Thesis
Language:English
English
Published: 2018
Subjects:
Online Access:http://eprints.utem.edu.my/id/eprint/23326/1/An%20Enhanced%20Malay%20Named%20Entity%20Recognition%20Using%20Clustering%20and%20Classification%20Approach%20For%20Crime%20Textual%20Data%20Analysis.pdf
http://eprints.utem.edu.my/id/eprint/23326/2/An%20enhanced%20malay%20named%20entity%20recognition%20using%20clustering%20and%20classification%20approach%20for%20crime%20textual%20data%20analysis.pdf
http://eprints.utem.edu.my/id/eprint/23326/
https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=112736
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknikal Malaysia Melaka
Language: English
English
Description
Summary:Named Entity Recognition (NER) is one of the tasks undertaken in the information extraction. NER is used for extracting and classifying words or entities that belong to the proper noun category in text data such as the person's name, location, organization, date, etc. As seen in today's generation, social media such as web pages, blogs, Facebook, Twitter, Instagram and online newspapers are among the major contributors to information extraction. These resources contain various types of unstructured data such as text. However, the amount of works done to process this type of data is limited for Malay Named Entity Recognition (MNER). The deficiency on Malay textual analytic has led to difficulties in extracting information for decision making. This research aims to present a Malay Named Entity Recognition technique that focuses on crime data analysis in the Malay language that extracted from Polis Diraja Malaysia (PDRM) news web page. This Malay Named Entity Recognition (MNER) technique is proposed by using multi-staged of clustering and classification methods. The methods are Fuzzy C-Means and K-Nearest Neighbors Algorithm. The methods involve multi-layer features extraction to recognize entities such as person name, location, organization, date and crime type. This multi-staged technique is obtained 95.24% accuracy in the process of recognizing named entities for text analysis, particularly in Malay. The proposed technique can improve the accuracy performance on named entity recognition of crime data based on the suitability selected features for the Malay language.