Towards robust models of code via energy-based learning on auxiliary datasets

Existing approaches to improving the robustness of source code models concentrate on recognizing adversarial samples rather than valid samples that fall outside of a given distribution, which we refer to as out-of-distribution (OOD) samples. To this end, we propose to use an auxiliary dataset (out-o...

Full description

Saved in:
Bibliographic Details
Main Authors: BUI, Duy Quoc Nghi, YU, Yijun
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/10117
https://ink.library.smu.edu.sg/context/sis_research/article/11117/viewcontent/RobustModelsCode_pv.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Existing approaches to improving the robustness of source code models concentrate on recognizing adversarial samples rather than valid samples that fall outside of a given distribution, which we refer to as out-of-distribution (OOD) samples. To this end, we propose to use an auxiliary dataset (out-of-distribution) such that, when trained together with the main dataset, they will enhance the model’s robustness. We adapt energy-bounded learning objective function to assign a higher score to in-distribution samples and a lower score to out-of-distribution samples in order to incorporate such out-of-distribution samples into the training process of source code models. In terms of OOD detection and adversarial samples detection, our evaluation results demonstrate a greater robustness for existing source code models to become more accurate at recognizing OOD data while being more resistant to adversarial attacks at the same time.