Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights

Plagiarism detection (PD) in natural language processing involves locating similar words in two distinct sources. The paper introduces a new approach to plagiarism detection utilizing bidirectional encoder represen-tations from transformers (BERT)-generated embedding, an enhanced artificial bee colo...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xiong, Jiale, Yang, Jing, Yang, Lei, Awais, Muhammad, Khan, Abdullah Ayub, Alizadehsani, Roohallah, Acharya, U. Rajendra
Format:	Article
Published:	Elsevier 2024
Subjects:	QA75 Electronic computers. Computer science TK Electrical engineering. Electronics Nuclear engineering
Online Access:	http://eprints.um.edu.my/44306/ https://doi.org/10.1016/j.eswa.2023.122088
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Malaya

id	my.um.eprints.44306
record_format	eprints
spelling	my.um.eprints.443062024-07-05T03:06:52Z http://eprints.um.edu.my/44306/ Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights Xiong, Jiale Yang, Jing Yang, Lei Awais, Muhammad Khan, Abdullah Ayub Alizadehsani, Roohallah Acharya, U. Rajendra QA75 Electronic computers. Computer science TK Electrical engineering. Electronics Nuclear engineering Plagiarism detection (PD) in natural language processing involves locating similar words in two distinct sources. The paper introduces a new approach to plagiarism detection utilizing bidirectional encoder represen-tations from transformers (BERT)-generated embedding, an enhanced artificial bee colony (ABC) optimization algorithm for pre-training, and a training process based on reinforcement learning (RL). The BERT model can be incorporated into a subsequent task and meticulously refined to function as a model, enabling it to apprehend a variety of linguistic characteristics. Imbalanced classification is one of the fundamental obstacles to PD. To handle this predicament, we present a novel methodology utilizing RL, in which the problem is framed as a series of sequential decisions in which an agent receives a reward at each level for classifying a received instance. To address the disparity between classes, it is determined that the majority class will receive a lower reward than the minority class. We also focus on the training stage, which often utilizes gradient-based learning techniques like backpropagation (BP), leading to certain drawbacks such as sensitivity to initialization. In our proposed model, we utilize a mutual learning-based ABC (ML-ABC) approach that adjusts the food source with the most beneficial results for the candidate by considering a mutual learning factor that incorporates the initial weight. We evaluated the efficacy of our novel approach by contrasting its results with those of population-based techniques using three standard datasets, namely Stanford Natural Language Inference (SNLI), Microsoft Research Paraphrase Corpus (MSRP), and Semantic Evaluation Database (SemEval2014). Our model attained excellent results that outperformed state-of-the-art models. Optimal values for important parameters, including reward function are identified for the model based on experiments on the study dataset. Ablation studies that exclude the proposed ML-ABC and reinforcement learning from the model confirm the independent positive incremental impact of these components on model performance. Elsevier 2024-03-15 Article PeerReviewed Xiong, Jiale and Yang, Jing and Yang, Lei and Awais, Muhammad and Khan, Abdullah Ayub and Alizadehsani, Roohallah and Acharya, U. Rajendra (2024) Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights. Expert Systems with Applications, 238 (E). ISSN 0957-4174, DOI https://doi.org/10.1016/j.eswa.2023.122088 <https://doi.org/10.1016/j.eswa.2023.122088>. https://doi.org/10.1016/j.eswa.2023.122088 10.1016/j.eswa.2023.122088
institution	Universiti Malaya
building	UM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaya
content_source	UM Research Repository
url_provider	http://eprints.um.edu.my/
topic	QA75 Electronic computers. Computer science TK Electrical engineering. Electronics Nuclear engineering
spellingShingle	QA75 Electronic computers. Computer science TK Electrical engineering. Electronics Nuclear engineering Xiong, Jiale Yang, Jing Yang, Lei Awais, Muhammad Khan, Abdullah Ayub Alizadehsani, Roohallah Acharya, U. Rajendra Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights
description	Plagiarism detection (PD) in natural language processing involves locating similar words in two distinct sources. The paper introduces a new approach to plagiarism detection utilizing bidirectional encoder represen-tations from transformers (BERT)-generated embedding, an enhanced artificial bee colony (ABC) optimization algorithm for pre-training, and a training process based on reinforcement learning (RL). The BERT model can be incorporated into a subsequent task and meticulously refined to function as a model, enabling it to apprehend a variety of linguistic characteristics. Imbalanced classification is one of the fundamental obstacles to PD. To handle this predicament, we present a novel methodology utilizing RL, in which the problem is framed as a series of sequential decisions in which an agent receives a reward at each level for classifying a received instance. To address the disparity between classes, it is determined that the majority class will receive a lower reward than the minority class. We also focus on the training stage, which often utilizes gradient-based learning techniques like backpropagation (BP), leading to certain drawbacks such as sensitivity to initialization. In our proposed model, we utilize a mutual learning-based ABC (ML-ABC) approach that adjusts the food source with the most beneficial results for the candidate by considering a mutual learning factor that incorporates the initial weight. We evaluated the efficacy of our novel approach by contrasting its results with those of population-based techniques using three standard datasets, namely Stanford Natural Language Inference (SNLI), Microsoft Research Paraphrase Corpus (MSRP), and Semantic Evaluation Database (SemEval2014). Our model attained excellent results that outperformed state-of-the-art models. Optimal values for important parameters, including reward function are identified for the model based on experiments on the study dataset. Ablation studies that exclude the proposed ML-ABC and reinforcement learning from the model confirm the independent positive incremental impact of these components on model performance.
format	Article
author	Xiong, Jiale Yang, Jing Yang, Lei Awais, Muhammad Khan, Abdullah Ayub Alizadehsani, Roohallah Acharya, U. Rajendra
author_facet	Xiong, Jiale Yang, Jing Yang, Lei Awais, Muhammad Khan, Abdullah Ayub Alizadehsani, Roohallah Acharya, U. Rajendra
author_sort	Xiong, Jiale
title	Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights
title_short	Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights
title_full	Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights
title_fullStr	Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights
title_full_unstemmed	Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights
title_sort	efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights
publisher	Elsevier
publishDate	2024
url	http://eprints.um.edu.my/44306/ https://doi.org/10.1016/j.eswa.2023.122088
_version_	1805881155607592960

Efficient reinforcement learning-based method for plagiarism detection boosted by a population-based algorithm for pretraining weights

Similar Items