Towards robust models of code via energy-based learning on auxiliary datasets
Existing approaches to improving the robustness of source code models concentrate on recognizing adversarial samples rather than valid samples that fall outside of a given distribution, which we refer to as out-of-distribution (OOD) samples. To this end, we propose to use an auxiliary dataset (out-o...
Saved in:
Main Authors: | , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2022
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/10117 https://ink.library.smu.edu.sg/context/sis_research/article/11117/viewcontent/RobustModelsCode_pv.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-11117 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-111172025-02-21T04:13:36Z Towards robust models of code via energy-based learning on auxiliary datasets BUI, Duy Quoc Nghi YU, Yijun Existing approaches to improving the robustness of source code models concentrate on recognizing adversarial samples rather than valid samples that fall outside of a given distribution, which we refer to as out-of-distribution (OOD) samples. To this end, we propose to use an auxiliary dataset (out-of-distribution) such that, when trained together with the main dataset, they will enhance the model’s robustness. We adapt energy-bounded learning objective function to assign a higher score to in-distribution samples and a lower score to out-of-distribution samples in order to incorporate such out-of-distribution samples into the training process of source code models. In terms of OOD detection and adversarial samples detection, our evaluation results demonstrate a greater robustness for existing source code models to become more accurate at recognizing OOD data while being more resistant to adversarial attacks at the same time. 2022-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/10117 info:doi/10.1145/3551349.3561171 https://ink.library.smu.edu.sg/context/sis_research/article/11117/viewcontent/RobustModelsCode_pv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Software Engineering |
spellingShingle |
Software Engineering BUI, Duy Quoc Nghi YU, Yijun Towards robust models of code via energy-based learning on auxiliary datasets |
description |
Existing approaches to improving the robustness of source code models concentrate on recognizing adversarial samples rather than valid samples that fall outside of a given distribution, which we refer to as out-of-distribution (OOD) samples. To this end, we propose to use an auxiliary dataset (out-of-distribution) such that, when trained together with the main dataset, they will enhance the model’s robustness. We adapt energy-bounded learning objective function to assign a higher score to in-distribution samples and a lower score to out-of-distribution samples in order to incorporate such out-of-distribution samples into the training process of source code models. In terms of OOD detection and adversarial samples detection, our evaluation results demonstrate a greater robustness for existing source code models to become more accurate at recognizing OOD data while being more resistant to adversarial attacks at the same time. |
format |
text |
author |
BUI, Duy Quoc Nghi YU, Yijun |
author_facet |
BUI, Duy Quoc Nghi YU, Yijun |
author_sort |
BUI, Duy Quoc Nghi |
title |
Towards robust models of code via energy-based learning on auxiliary datasets |
title_short |
Towards robust models of code via energy-based learning on auxiliary datasets |
title_full |
Towards robust models of code via energy-based learning on auxiliary datasets |
title_fullStr |
Towards robust models of code via energy-based learning on auxiliary datasets |
title_full_unstemmed |
Towards robust models of code via energy-based learning on auxiliary datasets |
title_sort |
towards robust models of code via energy-based learning on auxiliary datasets |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2022 |
url |
https://ink.library.smu.edu.sg/sis_research/10117 https://ink.library.smu.edu.sg/context/sis_research/article/11117/viewcontent/RobustModelsCode_pv.pdf |
_version_ |
1827070797486751744 |