Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network

Cross-project defect prediction (CPDP) as a means to focus quality assurance of software projects was under heavy investigation in recent years. In this paper, we propose a novel CPDP approach via deep learning. In particular, we model each program module via simplified abstract syntax tree (S-AST)....

Full description

Saved in:
Bibliographic Details
Main Authors: LI, Hao, LI, Xiaohong, CHEN, Xiang, XIE, Xiaofei, MU, Yanzhou, FENG, Zhiyong
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2019
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7094
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8097
record_format dspace
spelling sg-smu-ink.sis_research-80972022-04-07T06:06:03Z Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network LI, Hao LI, Xiaohong CHEN, Xiang XIE, Xiaofei MU, Yanzhou FENG, Zhiyong Cross-project defect prediction (CPDP) as a means to focus quality assurance of software projects was under heavy investigation in recent years. In this paper, we propose a novel CPDP approach via deep learning. In particular, we model each program module via simplified abstract syntax tree (S-AST). For each node in S-AST, only the project-independent node type is remained and other project-specific information (such as name of variable and method) is ignored, so that the modeling method is project-independent and suitable for CPDP issue. Then we extract token sequences from program modules modeled as S-AST. In addition, to construct meaningful vector representations for token sequences, we propose a novel unsupervised embedding method ASTToken2Vec, which learns semantic information from S-AST's natural structure. Finally, we use BLSTM (bi-directional long short-term memory) based neural network to automatically learn semantic features from vectorized token sequences and construct CPDP models. In our empirical studies, 10 real large-scale open source Java projects are chosen as our empirical subjects. Final results show that our proposed CPDP approach can perform significantly better than 5 state-of-the-art CPDP baselines in terms of AUC. 2019-07-19T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/7094 info:doi/10.1109/IJCNN.2019.8852135 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University OS and Networks Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic OS and Networks
Software Engineering
spellingShingle OS and Networks
Software Engineering
LI, Hao
LI, Xiaohong
CHEN, Xiang
XIE, Xiaofei
MU, Yanzhou
FENG, Zhiyong
Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network
description Cross-project defect prediction (CPDP) as a means to focus quality assurance of software projects was under heavy investigation in recent years. In this paper, we propose a novel CPDP approach via deep learning. In particular, we model each program module via simplified abstract syntax tree (S-AST). For each node in S-AST, only the project-independent node type is remained and other project-specific information (such as name of variable and method) is ignored, so that the modeling method is project-independent and suitable for CPDP issue. Then we extract token sequences from program modules modeled as S-AST. In addition, to construct meaningful vector representations for token sequences, we propose a novel unsupervised embedding method ASTToken2Vec, which learns semantic information from S-AST's natural structure. Finally, we use BLSTM (bi-directional long short-term memory) based neural network to automatically learn semantic features from vectorized token sequences and construct CPDP models. In our empirical studies, 10 real large-scale open source Java projects are chosen as our empirical subjects. Final results show that our proposed CPDP approach can perform significantly better than 5 state-of-the-art CPDP baselines in terms of AUC.
format text
author LI, Hao
LI, Xiaohong
CHEN, Xiang
XIE, Xiaofei
MU, Yanzhou
FENG, Zhiyong
author_facet LI, Hao
LI, Xiaohong
CHEN, Xiang
XIE, Xiaofei
MU, Yanzhou
FENG, Zhiyong
author_sort LI, Hao
title Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network
title_short Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network
title_full Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network
title_fullStr Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network
title_full_unstemmed Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network
title_sort cross-project defect prediction via asttoken2vec and blstm-based neural network
publisher Institutional Knowledge at Singapore Management University
publishDate 2019
url https://ink.library.smu.edu.sg/sis_research/7094
_version_ 1770576211028738048