Code problem similarity detection using code clones and pretrained models
There are many websites hosting code contests such as Leetcode, Codeforces and Codechef. These code contests on average attract 20k technology enthusiasts to participate, as getting a good rank in such contests can improve their problem solving skills and enhance their resume during job search. Thes...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/165850 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-165850 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1658502023-04-14T15:37:20Z Code problem similarity detection using code clones and pretrained models Yeo, Geremie Yun Siang Anwitaman Datta Patrick Pun Chi Seng School of Computer Science and Engineering Anwitaman@ntu.edu.sg, cspun@ntu.edu.sg Engineering::Computer science and engineering There are many websites hosting code contests such as Leetcode, Codeforces and Codechef. These code contests on average attract 20k technology enthusiasts to participate, as getting a good rank in such contests can improve their problem solving skills and enhance their resume during job search. These contests typically support solving code problems in multiple programming languages, such as Python, C++ and Java. However, due to the vast number of code problems that exist on these sites, it is inevitable that some of these will be duplicated or very similar to one another. Duplicated code problems during a contest is not ideal as contestants may copy solution source codes from the old problem which was published before the contest, gaining undeserved points and as such making the standings unfair. This paper proposes a solution to detect similar code problems on Codeforces, the world’s most popular competitive programming website with over 100k active users. The similarity is determined based on accepted solution source codes (*not the problem text) to determine which problems are similar to one another. Bachelor of Science in Mathematical and Computer Sciences 2023-04-14T01:13:14Z 2023-04-14T01:13:14Z 2023 Final Year Project (FYP) Yeo, G. Y. S. (2023). Code problem similarity detection using code clones and pretrained models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165850 https://hdl.handle.net/10356/165850 en SCSE22-0384 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Yeo, Geremie Yun Siang Code problem similarity detection using code clones and pretrained models |
description |
There are many websites hosting code contests such as Leetcode, Codeforces and Codechef. These code contests on average attract 20k technology enthusiasts to participate, as getting a good rank in such contests can improve their problem solving skills and enhance their resume during job search. These contests typically support solving code problems in multiple programming languages, such as Python, C++ and Java. However, due to the vast number of code problems that exist on these sites, it is inevitable that some of these will be duplicated or very similar to one another. Duplicated code problems during a contest is not ideal as contestants may copy solution source codes from the old problem which was published before the contest, gaining undeserved points and as such making the standings unfair. This paper proposes a solution to detect similar code problems on Codeforces, the world’s most popular competitive programming website with over 100k active users. The similarity is determined based on accepted solution source codes (*not the problem text) to determine which problems are similar to one another. |
author2 |
Anwitaman Datta |
author_facet |
Anwitaman Datta Yeo, Geremie Yun Siang |
format |
Final Year Project |
author |
Yeo, Geremie Yun Siang |
author_sort |
Yeo, Geremie Yun Siang |
title |
Code problem similarity detection using code clones and pretrained models |
title_short |
Code problem similarity detection using code clones and pretrained models |
title_full |
Code problem similarity detection using code clones and pretrained models |
title_fullStr |
Code problem similarity detection using code clones and pretrained models |
title_full_unstemmed |
Code problem similarity detection using code clones and pretrained models |
title_sort |
code problem similarity detection using code clones and pretrained models |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/165850 |
_version_ |
1764208051123912704 |