PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD
Indonesia is a country founded on the principle of the rule of law, as mandated by Article 1 Paragraph 3 of the 1945 Constitution. This principle implies that all aspects of communal living, society, and nationhood are based on the law. Indonesia, as a state, plays a formal role in connecting var...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/79650 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:79650 |
---|---|
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Indonesia is a country founded on the principle of the rule of law, as mandated by
Article 1 Paragraph 3 of the 1945 Constitution. This principle implies that all
aspects of communal living, society, and nationhood are based on the law.
Indonesia, as a state, plays a formal role in connecting various activities and
interactions within society. The orderliness in interactions among Indonesian
citizens is maintained and regulated by various laws, one of which is the Criminal
Code, often referred to as the KUHP. The KUHP is a collection of criminal laws
explicitly formulated, with each provision based on specific values, principles, and
norms that prioritize public interest. Society, at times, is unfamiliar with the
applicable criminal laws, leading to potential misidentification of criminal cases,
where the victims themselves may not be aware of the laws that ensnare them.
Technological advancements can bring about innovation and development in
information technology, particularly in the field of Natural Language Processing
(NLP) within Artificial Intelligence (AI). With the specific progress of deep learning
in NLP, the implementation of solutions for searching relevant criminal articles
and laws can be achieved. Previous research applied CNN and BiLSTM methods
using a Siamese architecture, demonstrating reasonably good results. In the
comparison between CNN and BiLSTM, BiLSTM showed superior performance.
Subsequent studies tested algorithms, with the hybrid CNN-BiLSTM method
proving to be the most effective compared to other methods, including CNN and
BiLSTM. Additionally, research comparing the performance of Word2Vec, Glove,
and FastText with the CNN method indicated that FastText yielded the best results.
This suggests there are alternative methods worth exploring beyond Word2Vec in
word embedding approaches, especially when combined with deep learning
methods in text classification cases. Therefore, this study implements a deep
learning approach based on the hybrid CNN-BiLSTM method. It also applies word
embedding models using Word2Vec, Glove, and FastText. The author also employs
baseline methods, namely single CNN and single BiLSTM methods. Experimental
results indicate that the combination of Hybrid CNN-BiLSTM with Word2Vec,
Glove, and Fasttext word embeddings can effectively address the search for KUHP
articles. In general, the Hybrid CNN-BiLSTM method outperforms the single CNN
and single BiLSTM methods, as evidenced by the highest evaluation metrics for
accuracy and precision, with values of 0.982 and 0.91, respectively, achieved in
v
conjunction with the Fasttext model. Regarding the Mean Reciprocal Rank, the
results indicate that the Hybrid CNN-BiLSTM method surpasses the other two
methods, with a Rank 5 MRR value of 0.165. The dataset used is collected from the
Central, East, North, West, and South Jakarta district court websites. The dataset
composition is divided into three parts: 80% of the dataset serves as Training Data
(Data Train), 10% as Validation Data, and the remaining 10% as Test Data (Data
Test). The experiments involve training the data with Gensim to construct vector
embedding representations, utilizing the word2vec (CBOW) algorithm, Glove, and
FastText with a context window size of 5, and employing a vector size of 100. The
Hybrid CNN-BiLSTM model is trained by creating two inputs to handle documents
and queries. Using the KERAS and TensorFlow libraries, the model parameters
include an Embedding Layer Size of 100, Kernel Size of 2, 3, 4, and 5, Maxlen of
30, Batch Size of 32, Filters and BiLSTM units of 258, Epochs of 20, Learning Rates
of 0.001, 0.005, 0.01, and 0.05, and Dropout values of 0.2, 0.3, 0.5, and 0.7. The
experiment results indicate that the Hybrid CNN-BiLSTM model, when combined
with Word2Vec, Glove, and Fasttext word embeddings, can effectively address the
search for KUHP articles. The modeling achieves the highest accuracy of 0.982,
precision of 0.91, and Mean Reciprocal Rank values, demonstrating superiority
over single CNN and BiLSTM methods. Future research is suggested to explore
other deep learning methods based on transformers, such as BERT, for potentially
better capturing the meaning of datasets. |
format |
Theses |
author |
Wahyu Candra Kusuma, Adi |
spellingShingle |
Wahyu Candra Kusuma, Adi PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD |
author_facet |
Wahyu Candra Kusuma, Adi |
author_sort |
Wahyu Candra Kusuma, Adi |
title |
PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD |
title_short |
PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD |
title_full |
PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD |
title_fullStr |
PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD |
title_full_unstemmed |
PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD |
title_sort |
processing criminal case texts for determination of the indonesian penal code article using hybrid deep learning method |
url |
https://digilib.itb.ac.id/gdl/view/79650 |
_version_ |
1822996402945392640 |
spelling |
id-itb.:796502024-01-14T23:46:20ZPROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD Wahyu Candra Kusuma, Adi Indonesia Theses Hybrid CNN-BiLSTM, deep learning, CNN, BiLSTM, word embedding. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/79650 Indonesia is a country founded on the principle of the rule of law, as mandated by Article 1 Paragraph 3 of the 1945 Constitution. This principle implies that all aspects of communal living, society, and nationhood are based on the law. Indonesia, as a state, plays a formal role in connecting various activities and interactions within society. The orderliness in interactions among Indonesian citizens is maintained and regulated by various laws, one of which is the Criminal Code, often referred to as the KUHP. The KUHP is a collection of criminal laws explicitly formulated, with each provision based on specific values, principles, and norms that prioritize public interest. Society, at times, is unfamiliar with the applicable criminal laws, leading to potential misidentification of criminal cases, where the victims themselves may not be aware of the laws that ensnare them. Technological advancements can bring about innovation and development in information technology, particularly in the field of Natural Language Processing (NLP) within Artificial Intelligence (AI). With the specific progress of deep learning in NLP, the implementation of solutions for searching relevant criminal articles and laws can be achieved. Previous research applied CNN and BiLSTM methods using a Siamese architecture, demonstrating reasonably good results. In the comparison between CNN and BiLSTM, BiLSTM showed superior performance. Subsequent studies tested algorithms, with the hybrid CNN-BiLSTM method proving to be the most effective compared to other methods, including CNN and BiLSTM. Additionally, research comparing the performance of Word2Vec, Glove, and FastText with the CNN method indicated that FastText yielded the best results. This suggests there are alternative methods worth exploring beyond Word2Vec in word embedding approaches, especially when combined with deep learning methods in text classification cases. Therefore, this study implements a deep learning approach based on the hybrid CNN-BiLSTM method. It also applies word embedding models using Word2Vec, Glove, and FastText. The author also employs baseline methods, namely single CNN and single BiLSTM methods. Experimental results indicate that the combination of Hybrid CNN-BiLSTM with Word2Vec, Glove, and Fasttext word embeddings can effectively address the search for KUHP articles. In general, the Hybrid CNN-BiLSTM method outperforms the single CNN and single BiLSTM methods, as evidenced by the highest evaluation metrics for accuracy and precision, with values of 0.982 and 0.91, respectively, achieved in v conjunction with the Fasttext model. Regarding the Mean Reciprocal Rank, the results indicate that the Hybrid CNN-BiLSTM method surpasses the other two methods, with a Rank 5 MRR value of 0.165. The dataset used is collected from the Central, East, North, West, and South Jakarta district court websites. The dataset composition is divided into three parts: 80% of the dataset serves as Training Data (Data Train), 10% as Validation Data, and the remaining 10% as Test Data (Data Test). The experiments involve training the data with Gensim to construct vector embedding representations, utilizing the word2vec (CBOW) algorithm, Glove, and FastText with a context window size of 5, and employing a vector size of 100. The Hybrid CNN-BiLSTM model is trained by creating two inputs to handle documents and queries. Using the KERAS and TensorFlow libraries, the model parameters include an Embedding Layer Size of 100, Kernel Size of 2, 3, 4, and 5, Maxlen of 30, Batch Size of 32, Filters and BiLSTM units of 258, Epochs of 20, Learning Rates of 0.001, 0.005, 0.01, and 0.05, and Dropout values of 0.2, 0.3, 0.5, and 0.7. The experiment results indicate that the Hybrid CNN-BiLSTM model, when combined with Word2Vec, Glove, and Fasttext word embeddings, can effectively address the search for KUHP articles. The modeling achieves the highest accuracy of 0.982, precision of 0.91, and Mean Reciprocal Rank values, demonstrating superiority over single CNN and BiLSTM methods. Future research is suggested to explore other deep learning methods based on transformers, such as BERT, for potentially better capturing the meaning of datasets. text |