MULTILABEL PREDICTION OF INDONESIA'S LEGAL REGULATION DOCUMENTS USING TEXT MINING

PT ABC is one of the largest legal media in Indonesia. PT ABC’s services consists of providing legal news and information as well as legal document collecting especially in Indonesia’s legal regulations. As of this day, PT ABC owns 52.255 Indonesía’s legal regulation documents. <br /...

Full description

Saved in:
Bibliographic Details
Main Author: Larasati - NIM 13413016 , Karanissa
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/22751
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:PT ABC is one of the largest legal media in Indonesia. PT ABC’s services consists of providing legal news and information as well as legal document collecting especially in Indonesia’s legal regulations. As of this day, PT ABC owns 52.255 Indonesía’s legal regulation documents. <br /> <br /> PT ABC intended to create a new product to enhance its ‘Data Center (Pusat Data)’ feature that has been managed by PT ABC itself. PT ABC eventually work with Dattabot, one of the biggest big data company in Indonesia, to create a new product named “ABC Advanced Search”. This product aims to ease users in finding Indonesia legal regulations collected by PT ABC to look for the right document for their needs. But during the process, there are several labels that need to be removed because it does not describe the topic nor the content of the document. As the result, 24.084 documents have no label. These labels are important for data collection and search result integration with other features in PT ABC’s website. <br /> <br /> One of the alternatives in automatic labeling is using text mining. By creating a multilabel classification prediction model, any document can have its labels predicted automatically in seconds. Accuracy performance measure used in this context is recall. With various parameter tuning attempts, the prediction model can produce recall scores as high as 90% using Support Vector Machine (SVM) and Naïve Bayes algorithm. However, based on another supporting accuracy performance measures, it is also understood that the labels quality resulted by SVM is better than labels produced by Naïve Bayes. This study also designed an application prototype that could be used by PT ABC for easier and mistake-proof future labeling process.