Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network
Cross-project defect prediction (CPDP) as a means to focus quality assurance of software projects was under heavy investigation in recent years. In this paper, we propose a novel CPDP approach via deep learning. In particular, we model each program module via simplified abstract syntax tree (S-AST)....
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2019
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/7094 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-8097 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-80972022-04-07T06:06:03Z Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network LI, Hao LI, Xiaohong CHEN, Xiang XIE, Xiaofei MU, Yanzhou FENG, Zhiyong Cross-project defect prediction (CPDP) as a means to focus quality assurance of software projects was under heavy investigation in recent years. In this paper, we propose a novel CPDP approach via deep learning. In particular, we model each program module via simplified abstract syntax tree (S-AST). For each node in S-AST, only the project-independent node type is remained and other project-specific information (such as name of variable and method) is ignored, so that the modeling method is project-independent and suitable for CPDP issue. Then we extract token sequences from program modules modeled as S-AST. In addition, to construct meaningful vector representations for token sequences, we propose a novel unsupervised embedding method ASTToken2Vec, which learns semantic information from S-AST's natural structure. Finally, we use BLSTM (bi-directional long short-term memory) based neural network to automatically learn semantic features from vectorized token sequences and construct CPDP models. In our empirical studies, 10 real large-scale open source Java projects are chosen as our empirical subjects. Final results show that our proposed CPDP approach can perform significantly better than 5 state-of-the-art CPDP baselines in terms of AUC. 2019-07-19T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/7094 info:doi/10.1109/IJCNN.2019.8852135 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University OS and Networks Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
OS and Networks Software Engineering |
spellingShingle |
OS and Networks Software Engineering LI, Hao LI, Xiaohong CHEN, Xiang XIE, Xiaofei MU, Yanzhou FENG, Zhiyong Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network |
description |
Cross-project defect prediction (CPDP) as a means to focus quality assurance of software projects was under heavy investigation in recent years. In this paper, we propose a novel CPDP approach via deep learning. In particular, we model each program module via simplified abstract syntax tree (S-AST). For each node in S-AST, only the project-independent node type is remained and other project-specific information (such as name of variable and method) is ignored, so that the modeling method is project-independent and suitable for CPDP issue. Then we extract token sequences from program modules modeled as S-AST. In addition, to construct meaningful vector representations for token sequences, we propose a novel unsupervised embedding method ASTToken2Vec, which learns semantic information from S-AST's natural structure. Finally, we use BLSTM (bi-directional long short-term memory) based neural network to automatically learn semantic features from vectorized token sequences and construct CPDP models. In our empirical studies, 10 real large-scale open source Java projects are chosen as our empirical subjects. Final results show that our proposed CPDP approach can perform significantly better than 5 state-of-the-art CPDP baselines in terms of AUC. |
format |
text |
author |
LI, Hao LI, Xiaohong CHEN, Xiang XIE, Xiaofei MU, Yanzhou FENG, Zhiyong |
author_facet |
LI, Hao LI, Xiaohong CHEN, Xiang XIE, Xiaofei MU, Yanzhou FENG, Zhiyong |
author_sort |
LI, Hao |
title |
Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network |
title_short |
Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network |
title_full |
Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network |
title_fullStr |
Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network |
title_full_unstemmed |
Cross-project defect prediction via ASTToken2Vec and BLSTM-based neural network |
title_sort |
cross-project defect prediction via asttoken2vec and blstm-based neural network |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2019 |
url |
https://ink.library.smu.edu.sg/sis_research/7094 |
_version_ |
1770576211028738048 |