MULTILABEL PREDICTION OF INDONESIA'S LEGAL REGULATION DOCUMENTS USING TEXT MINING
PT ABC is one of the largest legal media in Indonesia. PT ABC’s services consists of providing legal news and information as well as legal document collecting especially in Indonesia’s legal regulations. As of this day, PT ABC owns 52.255 IndonesÃÂa’s legal regulation documents. <br /...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/22751 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:22751 |
---|---|
spelling |
id-itb.:227512017-09-29T08:26:57ZMULTILABEL PREDICTION OF INDONESIA'S LEGAL REGULATION DOCUMENTS USING TEXT MINING Larasati - NIM 13413016 , Karanissa Indonesia Final Project INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/22751 PT ABC is one of the largest legal media in Indonesia. PT ABC’s services consists of providing legal news and information as well as legal document collecting especially in Indonesia’s legal regulations. As of this day, PT ABC owns 52.255 IndonesÃÂa’s legal regulation documents. <br /> <br /> PT ABC intended to create a new product to enhance its ‘Data Center (Pusat Data)’ feature that has been managed by PT ABC itself. PT ABC eventually work with Dattabot, one of the biggest big data company in Indonesia, to create a new product named “ABC Advanced Search”. This product aims to ease users in finding Indonesia legal regulations collected by PT ABC to look for the right document for their needs. But during the process, there are several labels that need to be removed because it does not describe the topic nor the content of the document. As the result, 24.084 documents have no label. These labels are important for data collection and search result integration with other features in PT ABC’s website. <br /> <br /> One of the alternatives in automatic labeling is using text mining. By creating a multilabel classification prediction model, any document can have its labels predicted automatically in seconds. Accuracy performance measure used in this context is recall. With various parameter tuning attempts, the prediction model can produce recall scores as high as 90% using Support Vector Machine (SVM) and Naïve Bayes algorithm. However, based on another supporting accuracy performance measures, it is also understood that the labels quality resulted by SVM is better than labels produced by Naïve Bayes. This study also designed an application prototype that could be used by PT ABC for easier and mistake-proof future labeling process. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
PT ABC is one of the largest legal media in Indonesia. PT ABC’s services consists of providing legal news and information as well as legal document collecting especially in Indonesia’s legal regulations. As of this day, PT ABC owns 52.255 IndonesÃÂa’s legal regulation documents. <br />
<br />
PT ABC intended to create a new product to enhance its ‘Data Center (Pusat Data)’ feature that has been managed by PT ABC itself. PT ABC eventually work with Dattabot, one of the biggest big data company in Indonesia, to create a new product named “ABC Advanced Search”. This product aims to ease users in finding Indonesia legal regulations collected by PT ABC to look for the right document for their needs. But during the process, there are several labels that need to be removed because it does not describe the topic nor the content of the document. As the result, 24.084 documents have no label. These labels are important for data collection and search result integration with other features in PT ABC’s website. <br />
<br />
One of the alternatives in automatic labeling is using text mining. By creating a multilabel classification prediction model, any document can have its labels predicted automatically in seconds. Accuracy performance measure used in this context is recall. With various parameter tuning attempts, the prediction model can produce recall scores as high as 90% using Support Vector Machine (SVM) and Naïve Bayes algorithm. However, based on another supporting accuracy performance measures, it is also understood that the labels quality resulted by SVM is better than labels produced by Naïve Bayes. This study also designed an application prototype that could be used by PT ABC for easier and mistake-proof future labeling process. |
format |
Final Project |
author |
Larasati - NIM 13413016 , Karanissa |
spellingShingle |
Larasati - NIM 13413016 , Karanissa MULTILABEL PREDICTION OF INDONESIA'S LEGAL REGULATION DOCUMENTS USING TEXT MINING |
author_facet |
Larasati - NIM 13413016 , Karanissa |
author_sort |
Larasati - NIM 13413016 , Karanissa |
title |
MULTILABEL PREDICTION OF INDONESIA'S LEGAL REGULATION DOCUMENTS USING TEXT MINING |
title_short |
MULTILABEL PREDICTION OF INDONESIA'S LEGAL REGULATION DOCUMENTS USING TEXT MINING |
title_full |
MULTILABEL PREDICTION OF INDONESIA'S LEGAL REGULATION DOCUMENTS USING TEXT MINING |
title_fullStr |
MULTILABEL PREDICTION OF INDONESIA'S LEGAL REGULATION DOCUMENTS USING TEXT MINING |
title_full_unstemmed |
MULTILABEL PREDICTION OF INDONESIA'S LEGAL REGULATION DOCUMENTS USING TEXT MINING |
title_sort |
multilabel prediction of indonesia's legal regulation documents using text mining |
url |
https://digilib.itb.ac.id/gdl/view/22751 |
_version_ |
1822019884970672128 |