AutoPruner: transformer-based call graph pruning

Constructing a static call graph requires trade-offs between soundness and precision. Program analysis techniques for constructing call graphs are unfortunately usually imprecise. To address this problem, researchers have recently proposed call graph pruning empowered by machine learning to post-pro...

Full description

Saved in:

Bibliographic Details
Main Authors:	LE, Cong Thanh, KANG, Hong Jin, NGUYEN, Truong Giang, AGUS HARYONO, Stefanus, LO, David, LE, Xuan-Bach D., THANG, Huynh Quyet
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Call graph pruning Static analysis Pretrained language model Transformer Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/7740 https://ink.library.smu.edu.sg/context/sis_research/article/8743/viewcontent/2209.03230.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-8743
record_format	dspace
spelling	sg-smu-ink.sis_research-87432023-01-10T02:40:46Z AutoPruner: transformer-based call graph pruning LE, Cong Thanh KANG, Hong Jin NGUYEN, Truong Giang AGUS HARYONO, Stefanus LO, David LE, Xuan-Bach D. THANG, Huynh Quyet Constructing a static call graph requires trade-offs between soundness and precision. Program analysis techniques for constructing call graphs are unfortunately usually imprecise. To address this problem, researchers have recently proposed call graph pruning empowered by machine learning to post-process call graphs constructed by static analysis. A machine learning model is built to capture information from the call graph by extracting structural features for use in a random forest classifier. It then removes edges that are predicted to be false positives. Despite the improvements shown by machine learning models, they are still limited as they do not consider the source code semantics and thus often are not able to effectively distinguish true and false positives.In this paper, we present a novel call graph pruning technique, AutoPruner, for eliminating false positives in call graphs via both statistical semantic and structural analysis. Given a call graph constructed by traditional static analysis tools, AutoPruner takes a Transformer-based approach to capture the semantic relationships between the caller and callee functions associated with each edge in the call graph. To do so, AutoPruner fine-tunes a model of code that was pre-trained on a large corpus to represent source code based on descriptions of its semantics. Next, the model is used to extract semantic features from the functions related to each edge in the call graph. AutoPruner uses these semantic features together with the structural features extracted from the call graph to classify each edge via a feed-forward neural network. Our empirical evaluation on a benchmark dataset of real-world programs shows that AutoPruner outperforms the state-of-the-art baselines, improving on F-measure by up to 13% in identifying false-positive edges in a static call graph. Moreover, AutoPruner achieves improvements on two client analyses, including halving the false alarm rate on null pointer analysis and over 10% improvements on monomorphic call-site detection. Additionally, our ablation study and qualitative analysis show that the semantic features extracted by AutoPruner capture a remarkable amount of information for distinguishing between true and false positives. 2022-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7740 info:doi/10.1145/3540250.3549175 https://ink.library.smu.edu.sg/context/sis_research/article/8743/viewcontent/2209.03230.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Call graph pruning Static analysis Pretrained language model Transformer Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Call graph pruning Static analysis Pretrained language model Transformer Software Engineering
spellingShingle	Call graph pruning Static analysis Pretrained language model Transformer Software Engineering LE, Cong Thanh KANG, Hong Jin NGUYEN, Truong Giang AGUS HARYONO, Stefanus LO, David LE, Xuan-Bach D. THANG, Huynh Quyet AutoPruner: transformer-based call graph pruning
description	Constructing a static call graph requires trade-offs between soundness and precision. Program analysis techniques for constructing call graphs are unfortunately usually imprecise. To address this problem, researchers have recently proposed call graph pruning empowered by machine learning to post-process call graphs constructed by static analysis. A machine learning model is built to capture information from the call graph by extracting structural features for use in a random forest classifier. It then removes edges that are predicted to be false positives. Despite the improvements shown by machine learning models, they are still limited as they do not consider the source code semantics and thus often are not able to effectively distinguish true and false positives.In this paper, we present a novel call graph pruning technique, AutoPruner, for eliminating false positives in call graphs via both statistical semantic and structural analysis. Given a call graph constructed by traditional static analysis tools, AutoPruner takes a Transformer-based approach to capture the semantic relationships between the caller and callee functions associated with each edge in the call graph. To do so, AutoPruner fine-tunes a model of code that was pre-trained on a large corpus to represent source code based on descriptions of its semantics. Next, the model is used to extract semantic features from the functions related to each edge in the call graph. AutoPruner uses these semantic features together with the structural features extracted from the call graph to classify each edge via a feed-forward neural network. Our empirical evaluation on a benchmark dataset of real-world programs shows that AutoPruner outperforms the state-of-the-art baselines, improving on F-measure by up to 13% in identifying false-positive edges in a static call graph. Moreover, AutoPruner achieves improvements on two client analyses, including halving the false alarm rate on null pointer analysis and over 10% improvements on monomorphic call-site detection. Additionally, our ablation study and qualitative analysis show that the semantic features extracted by AutoPruner capture a remarkable amount of information for distinguishing between true and false positives.
format	text
author	LE, Cong Thanh KANG, Hong Jin NGUYEN, Truong Giang AGUS HARYONO, Stefanus LO, David LE, Xuan-Bach D. THANG, Huynh Quyet
author_facet	LE, Cong Thanh KANG, Hong Jin NGUYEN, Truong Giang AGUS HARYONO, Stefanus LO, David LE, Xuan-Bach D. THANG, Huynh Quyet
author_sort	LE, Cong Thanh
title	AutoPruner: transformer-based call graph pruning
title_short	AutoPruner: transformer-based call graph pruning
title_full	AutoPruner: transformer-based call graph pruning
title_fullStr	AutoPruner: transformer-based call graph pruning
title_full_unstemmed	AutoPruner: transformer-based call graph pruning
title_sort	autopruner: transformer-based call graph pruning
publisher	Institutional Knowledge at Singapore Management University
publishDate	2022
url	https://ink.library.smu.edu.sg/sis_research/7740 https://ink.library.smu.edu.sg/context/sis_research/article/8743/viewcontent/2209.03230.pdf
_version_	1770576424499937280

AutoPruner: transformer-based call graph pruning

Similar Items