Active learning of discriminative subgraph patterns for API misuse detection

A common cause of bugs and vulnerabilities are the violations of usage constraints associated with Application Programming Interfaces (APIs). API misuses are common in software projects, and while there have been techniques proposed to detect such misuses, studies have shown that they fail to reliab...

Full description

Saved in:
Bibliographic Details
Main Authors: KANG, Hong Jin, LO, David
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7635
https://ink.library.smu.edu.sg/context/sis_research/article/8638/viewcontent/2204.09945.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8638
record_format dspace
spelling sg-smu-ink.sis_research-86382023-01-10T03:55:43Z Active learning of discriminative subgraph patterns for API misuse detection KANG, Hong Jin LO, David A common cause of bugs and vulnerabilities are the violations of usage constraints associated with Application Programming Interfaces (APIs). API misuses are common in software projects, and while there have been techniques proposed to detect such misuses, studies have shown that they fail to reliably detect misuses while reporting many false positives. One limitation of prior work is the inability to reliably identify correct patterns of usage. Many approaches confuse a usage pattern’s frequency for correctness. Due to the variety of alternative usage patterns that may be uncommon but correct, anomaly detection-based techniques have limited success in identifying misuses. We address these challenges and propose ALP (Actively Learned Patterns), reformulating API misuse detection as a classification problem. After representing programs as graphs, ALP mines discriminative subgraphs. While still incorporating frequency information, through limited human supervision, we reduce the reliance on the assumption relating frequency and correctness. The principles of active learning are incorporated to shift human attention away from the most frequent patterns. Instead, ALP samples informative and representative examples while minimizing labeling effort. In our empirical evaluation, ALP substantially outperforms prior approaches on both MUBench, an API Misuse benchmark, and a new dataset that we constructed from real-world software projects. 2022-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7635 info:doi/10.1109/TSE.2021.3069978 https://ink.library.smu.edu.sg/context/sis_research/article/8638/viewcontent/2204.09945.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University API-Misuse Detection Discriminative Subgraph Mining Graph Classification Active Learning Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic API-Misuse Detection
Discriminative Subgraph Mining
Graph Classification
Active Learning
Software Engineering
spellingShingle API-Misuse Detection
Discriminative Subgraph Mining
Graph Classification
Active Learning
Software Engineering
KANG, Hong Jin
LO, David
Active learning of discriminative subgraph patterns for API misuse detection
description A common cause of bugs and vulnerabilities are the violations of usage constraints associated with Application Programming Interfaces (APIs). API misuses are common in software projects, and while there have been techniques proposed to detect such misuses, studies have shown that they fail to reliably detect misuses while reporting many false positives. One limitation of prior work is the inability to reliably identify correct patterns of usage. Many approaches confuse a usage pattern’s frequency for correctness. Due to the variety of alternative usage patterns that may be uncommon but correct, anomaly detection-based techniques have limited success in identifying misuses. We address these challenges and propose ALP (Actively Learned Patterns), reformulating API misuse detection as a classification problem. After representing programs as graphs, ALP mines discriminative subgraphs. While still incorporating frequency information, through limited human supervision, we reduce the reliance on the assumption relating frequency and correctness. The principles of active learning are incorporated to shift human attention away from the most frequent patterns. Instead, ALP samples informative and representative examples while minimizing labeling effort. In our empirical evaluation, ALP substantially outperforms prior approaches on both MUBench, an API Misuse benchmark, and a new dataset that we constructed from real-world software projects.
format text
author KANG, Hong Jin
LO, David
author_facet KANG, Hong Jin
LO, David
author_sort KANG, Hong Jin
title Active learning of discriminative subgraph patterns for API misuse detection
title_short Active learning of discriminative subgraph patterns for API misuse detection
title_full Active learning of discriminative subgraph patterns for API misuse detection
title_fullStr Active learning of discriminative subgraph patterns for API misuse detection
title_full_unstemmed Active learning of discriminative subgraph patterns for API misuse detection
title_sort active learning of discriminative subgraph patterns for api misuse detection
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/7635
https://ink.library.smu.edu.sg/context/sis_research/article/8638/viewcontent/2204.09945.pdf
_version_ 1770576397816823808