Combining Software Metrics and Text Features for Vulnerable File Prediction

In recent years, to help developers reduce time and effort required to build highly secure software, a number of prediction models which are built on different kinds of features have been proposed to identify vulnerable source code files. In this paper, we propose a novel approach VULPREDICTOR to pr...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHANG, Yun, David LO, XIA, Xin, XU, Bowen, SUN, Jianling Sun, LI, Shanping
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2015
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/3097
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4097
record_format dspace
spelling sg-smu-ink.sis_research-40972016-02-05T06:30:05Z Combining Software Metrics and Text Features for Vulnerable File Prediction ZHANG, Yun David LO, XIA, Xin XU, Bowen SUN, Jianling Sun LI, Shanping In recent years, to help developers reduce time and effort required to build highly secure software, a number of prediction models which are built on different kinds of features have been proposed to identify vulnerable source code files. In this paper, we propose a novel approach VULPREDICTOR to predict vulnerable files, it analyzes software metrics and text mining together to build a composite prediction model. VULPREDICTOR first builds 6 underlying classifiers on a training set of vulnerable and non-vulnerable files represented by their software metrics and text features, and then constructs a meta classifier to process the outputs of the 6 underlying classifiers. We evaluate our solution on datasets from three web applications including Drupal, PHPMyAdmin and Moodle which contain a total of 3,466 files and 223 vulnerabilities. The experiment results show that VULPREDICTOR can achieve F1 and EffectivenessRatio@20% scores of up to 0.683 and 75%, respectively. On average across the 3 projects, VULPREDICTOR improves the F1 and EffectivenessRatio@20% scores of the best performing state-of-the-art approaches proposed by Walden et al. by 46.53% and 14.93%, respectively. 2015-12-11T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/3097 info:doi/10.1109/ICECCS.2015.15 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Machine Learning Text Mining Vulnerable File Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Machine Learning
Text Mining
Vulnerable File
Software Engineering
spellingShingle Machine Learning
Text Mining
Vulnerable File
Software Engineering
ZHANG, Yun
David LO,
XIA, Xin
XU, Bowen
SUN, Jianling Sun
LI, Shanping
Combining Software Metrics and Text Features for Vulnerable File Prediction
description In recent years, to help developers reduce time and effort required to build highly secure software, a number of prediction models which are built on different kinds of features have been proposed to identify vulnerable source code files. In this paper, we propose a novel approach VULPREDICTOR to predict vulnerable files, it analyzes software metrics and text mining together to build a composite prediction model. VULPREDICTOR first builds 6 underlying classifiers on a training set of vulnerable and non-vulnerable files represented by their software metrics and text features, and then constructs a meta classifier to process the outputs of the 6 underlying classifiers. We evaluate our solution on datasets from three web applications including Drupal, PHPMyAdmin and Moodle which contain a total of 3,466 files and 223 vulnerabilities. The experiment results show that VULPREDICTOR can achieve F1 and EffectivenessRatio@20% scores of up to 0.683 and 75%, respectively. On average across the 3 projects, VULPREDICTOR improves the F1 and EffectivenessRatio@20% scores of the best performing state-of-the-art approaches proposed by Walden et al. by 46.53% and 14.93%, respectively.
format text
author ZHANG, Yun
David LO,
XIA, Xin
XU, Bowen
SUN, Jianling Sun
LI, Shanping
author_facet ZHANG, Yun
David LO,
XIA, Xin
XU, Bowen
SUN, Jianling Sun
LI, Shanping
author_sort ZHANG, Yun
title Combining Software Metrics and Text Features for Vulnerable File Prediction
title_short Combining Software Metrics and Text Features for Vulnerable File Prediction
title_full Combining Software Metrics and Text Features for Vulnerable File Prediction
title_fullStr Combining Software Metrics and Text Features for Vulnerable File Prediction
title_full_unstemmed Combining Software Metrics and Text Features for Vulnerable File Prediction
title_sort combining software metrics and text features for vulnerable file prediction
publisher Institutional Knowledge at Singapore Management University
publishDate 2015
url https://ink.library.smu.edu.sg/sis_research/3097
_version_ 1770572808711045120