PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD

Indonesia is a country founded on the principle of the rule of law, as mandated by Article 1 Paragraph 3 of the 1945 Constitution. This principle implies that all aspects of communal living, society, and nationhood are based on the law. Indonesia, as a state, plays a formal role in connecting var...

Full description

Saved in:

Bibliographic Details
Main Author:	Wahyu Candra Kusuma, Adi
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/79650
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:79650
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Indonesia is a country founded on the principle of the rule of law, as mandated by Article 1 Paragraph 3 of the 1945 Constitution. This principle implies that all aspects of communal living, society, and nationhood are based on the law. Indonesia, as a state, plays a formal role in connecting various activities and interactions within society. The orderliness in interactions among Indonesian citizens is maintained and regulated by various laws, one of which is the Criminal Code, often referred to as the KUHP. The KUHP is a collection of criminal laws explicitly formulated, with each provision based on specific values, principles, and norms that prioritize public interest. Society, at times, is unfamiliar with the applicable criminal laws, leading to potential misidentification of criminal cases, where the victims themselves may not be aware of the laws that ensnare them. Technological advancements can bring about innovation and development in information technology, particularly in the field of Natural Language Processing (NLP) within Artificial Intelligence (AI). With the specific progress of deep learning in NLP, the implementation of solutions for searching relevant criminal articles and laws can be achieved. Previous research applied CNN and BiLSTM methods using a Siamese architecture, demonstrating reasonably good results. In the comparison between CNN and BiLSTM, BiLSTM showed superior performance. Subsequent studies tested algorithms, with the hybrid CNN-BiLSTM method proving to be the most effective compared to other methods, including CNN and BiLSTM. Additionally, research comparing the performance of Word2Vec, Glove, and FastText with the CNN method indicated that FastText yielded the best results. This suggests there are alternative methods worth exploring beyond Word2Vec in word embedding approaches, especially when combined with deep learning methods in text classification cases. Therefore, this study implements a deep learning approach based on the hybrid CNN-BiLSTM method. It also applies word embedding models using Word2Vec, Glove, and FastText. The author also employs baseline methods, namely single CNN and single BiLSTM methods. Experimental results indicate that the combination of Hybrid CNN-BiLSTM with Word2Vec, Glove, and Fasttext word embeddings can effectively address the search for KUHP articles. In general, the Hybrid CNN-BiLSTM method outperforms the single CNN and single BiLSTM methods, as evidenced by the highest evaluation metrics for accuracy and precision, with values of 0.982 and 0.91, respectively, achieved in v conjunction with the Fasttext model. Regarding the Mean Reciprocal Rank, the results indicate that the Hybrid CNN-BiLSTM method surpasses the other two methods, with a Rank 5 MRR value of 0.165. The dataset used is collected from the Central, East, North, West, and South Jakarta district court websites. The dataset composition is divided into three parts: 80% of the dataset serves as Training Data (Data Train), 10% as Validation Data, and the remaining 10% as Test Data (Data Test). The experiments involve training the data with Gensim to construct vector embedding representations, utilizing the word2vec (CBOW) algorithm, Glove, and FastText with a context window size of 5, and employing a vector size of 100. The Hybrid CNN-BiLSTM model is trained by creating two inputs to handle documents and queries. Using the KERAS and TensorFlow libraries, the model parameters include an Embedding Layer Size of 100, Kernel Size of 2, 3, 4, and 5, Maxlen of 30, Batch Size of 32, Filters and BiLSTM units of 258, Epochs of 20, Learning Rates of 0.001, 0.005, 0.01, and 0.05, and Dropout values of 0.2, 0.3, 0.5, and 0.7. The experiment results indicate that the Hybrid CNN-BiLSTM model, when combined with Word2Vec, Glove, and Fasttext word embeddings, can effectively address the search for KUHP articles. The modeling achieves the highest accuracy of 0.982, precision of 0.91, and Mean Reciprocal Rank values, demonstrating superiority over single CNN and BiLSTM methods. Future research is suggested to explore other deep learning methods based on transformers, such as BERT, for potentially better capturing the meaning of datasets.
format	Theses
author	Wahyu Candra Kusuma, Adi
spellingShingle	Wahyu Candra Kusuma, Adi PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD
author_facet	Wahyu Candra Kusuma, Adi
author_sort	Wahyu Candra Kusuma, Adi
title	PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD
title_short	PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD
title_full	PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD
title_fullStr	PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD
title_full_unstemmed	PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD
title_sort	processing criminal case texts for determination of the indonesian penal code article using hybrid deep learning method
url	https://digilib.itb.ac.id/gdl/view/79650
_version_	1822996402945392640
spelling	id-itb.:796502024-01-14T23:46:20ZPROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD Wahyu Candra Kusuma, Adi Indonesia Theses Hybrid CNN-BiLSTM, deep learning, CNN, BiLSTM, word embedding. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/79650 Indonesia is a country founded on the principle of the rule of law, as mandated by Article 1 Paragraph 3 of the 1945 Constitution. This principle implies that all aspects of communal living, society, and nationhood are based on the law. Indonesia, as a state, plays a formal role in connecting various activities and interactions within society. The orderliness in interactions among Indonesian citizens is maintained and regulated by various laws, one of which is the Criminal Code, often referred to as the KUHP. The KUHP is a collection of criminal laws explicitly formulated, with each provision based on specific values, principles, and norms that prioritize public interest. Society, at times, is unfamiliar with the applicable criminal laws, leading to potential misidentification of criminal cases, where the victims themselves may not be aware of the laws that ensnare them. Technological advancements can bring about innovation and development in information technology, particularly in the field of Natural Language Processing (NLP) within Artificial Intelligence (AI). With the specific progress of deep learning in NLP, the implementation of solutions for searching relevant criminal articles and laws can be achieved. Previous research applied CNN and BiLSTM methods using a Siamese architecture, demonstrating reasonably good results. In the comparison between CNN and BiLSTM, BiLSTM showed superior performance. Subsequent studies tested algorithms, with the hybrid CNN-BiLSTM method proving to be the most effective compared to other methods, including CNN and BiLSTM. Additionally, research comparing the performance of Word2Vec, Glove, and FastText with the CNN method indicated that FastText yielded the best results. This suggests there are alternative methods worth exploring beyond Word2Vec in word embedding approaches, especially when combined with deep learning methods in text classification cases. Therefore, this study implements a deep learning approach based on the hybrid CNN-BiLSTM method. It also applies word embedding models using Word2Vec, Glove, and FastText. The author also employs baseline methods, namely single CNN and single BiLSTM methods. Experimental results indicate that the combination of Hybrid CNN-BiLSTM with Word2Vec, Glove, and Fasttext word embeddings can effectively address the search for KUHP articles. In general, the Hybrid CNN-BiLSTM method outperforms the single CNN and single BiLSTM methods, as evidenced by the highest evaluation metrics for accuracy and precision, with values of 0.982 and 0.91, respectively, achieved in v conjunction with the Fasttext model. Regarding the Mean Reciprocal Rank, the results indicate that the Hybrid CNN-BiLSTM method surpasses the other two methods, with a Rank 5 MRR value of 0.165. The dataset used is collected from the Central, East, North, West, and South Jakarta district court websites. The dataset composition is divided into three parts: 80% of the dataset serves as Training Data (Data Train), 10% as Validation Data, and the remaining 10% as Test Data (Data Test). The experiments involve training the data with Gensim to construct vector embedding representations, utilizing the word2vec (CBOW) algorithm, Glove, and FastText with a context window size of 5, and employing a vector size of 100. The Hybrid CNN-BiLSTM model is trained by creating two inputs to handle documents and queries. Using the KERAS and TensorFlow libraries, the model parameters include an Embedding Layer Size of 100, Kernel Size of 2, 3, 4, and 5, Maxlen of 30, Batch Size of 32, Filters and BiLSTM units of 258, Epochs of 20, Learning Rates of 0.001, 0.005, 0.01, and 0.05, and Dropout values of 0.2, 0.3, 0.5, and 0.7. The experiment results indicate that the Hybrid CNN-BiLSTM model, when combined with Word2Vec, Glove, and Fasttext word embeddings, can effectively address the search for KUHP articles. The modeling achieves the highest accuracy of 0.982, precision of 0.91, and Mean Reciprocal Rank values, demonstrating superiority over single CNN and BiLSTM methods. Future research is suggested to explore other deep learning methods based on transformers, such as BERT, for potentially better capturing the meaning of datasets. text

PROCESSING CRIMINAL CASE TEXTS FOR DETERMINATION OF THE INDONESIAN PENAL CODE ARTICLE USING HYBRID DEEP LEARNING METHOD

Similar Items