Code problem similarity detection using code clones and pretrained models

There are many websites hosting code contests such as Leetcode, Codeforces and Codechef. These code contests on average attract 20k technology enthusiasts to participate, as getting a good rank in such contests can improve their problem solving skills and enhance their resume during job search. Thes...

Full description

Saved in:
Bibliographic Details
Main Author: Yeo, Geremie Yun Siang
Other Authors: Anwitaman Datta
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165850
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-165850
record_format dspace
spelling sg-ntu-dr.10356-1658502023-04-14T15:37:20Z Code problem similarity detection using code clones and pretrained models Yeo, Geremie Yun Siang Anwitaman Datta Patrick Pun Chi Seng School of Computer Science and Engineering Anwitaman@ntu.edu.sg, cspun@ntu.edu.sg Engineering::Computer science and engineering There are many websites hosting code contests such as Leetcode, Codeforces and Codechef. These code contests on average attract 20k technology enthusiasts to participate, as getting a good rank in such contests can improve their problem solving skills and enhance their resume during job search. These contests typically support solving code problems in multiple programming languages, such as Python, C++ and Java. However, due to the vast number of code problems that exist on these sites, it is inevitable that some of these will be duplicated or very similar to one another. Duplicated code problems during a contest is not ideal as contestants may copy solution source codes from the old problem which was published before the contest, gaining undeserved points and as such making the standings unfair. This paper proposes a solution to detect similar code problems on Codeforces, the world’s most popular competitive programming website with over 100k active users. The similarity is determined based on accepted solution source codes (*not the problem text) to determine which problems are similar to one another. Bachelor of Science in Mathematical and Computer Sciences 2023-04-14T01:13:14Z 2023-04-14T01:13:14Z 2023 Final Year Project (FYP) Yeo, G. Y. S. (2023). Code problem similarity detection using code clones and pretrained models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165850 https://hdl.handle.net/10356/165850 en SCSE22-0384 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Yeo, Geremie Yun Siang
Code problem similarity detection using code clones and pretrained models
description There are many websites hosting code contests such as Leetcode, Codeforces and Codechef. These code contests on average attract 20k technology enthusiasts to participate, as getting a good rank in such contests can improve their problem solving skills and enhance their resume during job search. These contests typically support solving code problems in multiple programming languages, such as Python, C++ and Java. However, due to the vast number of code problems that exist on these sites, it is inevitable that some of these will be duplicated or very similar to one another. Duplicated code problems during a contest is not ideal as contestants may copy solution source codes from the old problem which was published before the contest, gaining undeserved points and as such making the standings unfair. This paper proposes a solution to detect similar code problems on Codeforces, the world’s most popular competitive programming website with over 100k active users. The similarity is determined based on accepted solution source codes (*not the problem text) to determine which problems are similar to one another.
author2 Anwitaman Datta
author_facet Anwitaman Datta
Yeo, Geremie Yun Siang
format Final Year Project
author Yeo, Geremie Yun Siang
author_sort Yeo, Geremie Yun Siang
title Code problem similarity detection using code clones and pretrained models
title_short Code problem similarity detection using code clones and pretrained models
title_full Code problem similarity detection using code clones and pretrained models
title_fullStr Code problem similarity detection using code clones and pretrained models
title_full_unstemmed Code problem similarity detection using code clones and pretrained models
title_sort code problem similarity detection using code clones and pretrained models
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/165850
_version_ 1764208051123912704