Classifying source code: How far can compressor-based classifiers go?
Pre-trained language models of code, which are built upon large-scale datasets, millions of trainable parameters, and high computational resources cost, have achieved phenomenal success. Recently, researchers have proposed a compressor-based classifier (Cbc); it trains no parameters but is found to...
Saved in:
Main Author: | |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8920 https://ink.library.smu.edu.sg/context/sis_research/article/9923/viewcontent/3639478.3641229_pvoa_cc_by.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-9923 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-99232024-10-17T06:04:06Z Classifying source code: How far can compressor-based classifiers go? YANG, Zhou Pre-trained language models of code, which are built upon large-scale datasets, millions of trainable parameters, and high computational resources cost, have achieved phenomenal success. Recently, researchers have proposed a compressor-based classifier (Cbc); it trains no parameters but is found to outperform BERT. We conduct the first empirical study to explore whether this lightweight alternative can accurately classify source code. Our study is more than applying Cbc to code-related tasks. We first identify an issue that the original implementation overestimates Cbc. After correction, Cbc's performance on defect prediction drops from 80.7% to 63.0%, which is still comparable to CodeBERT (63.7%). We find that hyperparameter settings affect the performance. Besides, results show that Cbc can outperform CodeBERT when the training data is small, making it a good alternative in low-resource settings. 2024-04-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8920 info:doi/10.1145/3639478.3641229 https://ink.library.smu.edu.sg/context/sis_research/article/9923/viewcontent/3639478.3641229_pvoa_cc_by.pdf http://creativecommons.org/licenses/by/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Defect Software Prediction Efficient Learning Robustness Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Defect Software Prediction Efficient Learning Robustness Software Engineering |
spellingShingle |
Defect Software Prediction Efficient Learning Robustness Software Engineering YANG, Zhou Classifying source code: How far can compressor-based classifiers go? |
description |
Pre-trained language models of code, which are built upon large-scale datasets, millions of trainable parameters, and high computational resources cost, have achieved phenomenal success. Recently, researchers have proposed a compressor-based classifier (Cbc); it trains no parameters but is found to outperform BERT. We conduct the first empirical study to explore whether this lightweight alternative can accurately classify source code. Our study is more than applying Cbc to code-related tasks. We first identify an issue that the original implementation overestimates Cbc. After correction, Cbc's performance on defect prediction drops from 80.7% to 63.0%, which is still comparable to CodeBERT (63.7%). We find that hyperparameter settings affect the performance. Besides, results show that Cbc can outperform CodeBERT when the training data is small, making it a good alternative in low-resource settings. |
format |
text |
author |
YANG, Zhou |
author_facet |
YANG, Zhou |
author_sort |
YANG, Zhou |
title |
Classifying source code: How far can compressor-based classifiers go? |
title_short |
Classifying source code: How far can compressor-based classifiers go? |
title_full |
Classifying source code: How far can compressor-based classifiers go? |
title_fullStr |
Classifying source code: How far can compressor-based classifiers go? |
title_full_unstemmed |
Classifying source code: How far can compressor-based classifiers go? |
title_sort |
classifying source code: how far can compressor-based classifiers go? |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2024 |
url |
https://ink.library.smu.edu.sg/sis_research/8920 https://ink.library.smu.edu.sg/context/sis_research/article/9923/viewcontent/3639478.3641229_pvoa_cc_by.pdf |
_version_ |
1814047946855940096 |