FAIR: Flow type-aware pre-training of compiler intermediate representations

While the majority of existing pre-trained models from code learn source code features such as code tokens and abstract syntax trees, there are some other works that focus on learning from compiler intermediate representations (IRs). Existing IR-based models typically utilize IR features such as ins...

Full description

Saved in:
Bibliographic Details
Main Authors: NIU, Changan, LI, Chuanyi, NG, Vincent, LO, David, LUO, Bin
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9265
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10265
record_format dspace
spelling sg-smu-ink.sis_research-102652024-09-02T04:48:03Z FAIR: Flow type-aware pre-training of compiler intermediate representations NIU, Changan LI, Chuanyi NG, Vincent LO, David LUO, Bin While the majority of existing pre-trained models from code learn source code features such as code tokens and abstract syntax trees, there are some other works that focus on learning from compiler intermediate representations (IRs). Existing IR-based models typically utilize IR features such as instructions, control and data flow graphs (CDFGs), call graphs, etc. However, these methods confuse variable nodes and instruction nodes in a CDFG and fail to distinguish different types of flows, and the neural networks they use fail to capture long-distance dependencies and have over-smoothing and over-squashing problems. To address these weaknesses, we propose FAIR, a Flow type-Aware pre-trained model for IR that involves employing (1) a novel input representation of IR programs; (2) Graph Transformer to address over-smoothing, over-squashing and long-dependencies problems; and (3) five pre-training tasks that we specifically propose to enable FAIR to learn the semantics of IR tokens, flow type information, and the overall representation of IR. Experimental results show that FAIR can achieve state-of-the-art results on four code-related downstream tasks. 2024-04-20T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/9265 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Software Engineering
spellingShingle Software Engineering
NIU, Changan
LI, Chuanyi
NG, Vincent
LO, David
LUO, Bin
FAIR: Flow type-aware pre-training of compiler intermediate representations
description While the majority of existing pre-trained models from code learn source code features such as code tokens and abstract syntax trees, there are some other works that focus on learning from compiler intermediate representations (IRs). Existing IR-based models typically utilize IR features such as instructions, control and data flow graphs (CDFGs), call graphs, etc. However, these methods confuse variable nodes and instruction nodes in a CDFG and fail to distinguish different types of flows, and the neural networks they use fail to capture long-distance dependencies and have over-smoothing and over-squashing problems. To address these weaknesses, we propose FAIR, a Flow type-Aware pre-trained model for IR that involves employing (1) a novel input representation of IR programs; (2) Graph Transformer to address over-smoothing, over-squashing and long-dependencies problems; and (3) five pre-training tasks that we specifically propose to enable FAIR to learn the semantics of IR tokens, flow type information, and the overall representation of IR. Experimental results show that FAIR can achieve state-of-the-art results on four code-related downstream tasks.
format text
author NIU, Changan
LI, Chuanyi
NG, Vincent
LO, David
LUO, Bin
author_facet NIU, Changan
LI, Chuanyi
NG, Vincent
LO, David
LUO, Bin
author_sort NIU, Changan
title FAIR: Flow type-aware pre-training of compiler intermediate representations
title_short FAIR: Flow type-aware pre-training of compiler intermediate representations
title_full FAIR: Flow type-aware pre-training of compiler intermediate representations
title_fullStr FAIR: Flow type-aware pre-training of compiler intermediate representations
title_full_unstemmed FAIR: Flow type-aware pre-training of compiler intermediate representations
title_sort fair: flow type-aware pre-training of compiler intermediate representations
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9265
_version_ 1814047848957739008