Bug or not? Bug Report classification using N-gram IDF

© 2017 IEEE. Previous studies have found that a significant number of bug reports are misclassified between bugs and nonbugs, and that manually classifying bug reports is a time-consuming task. To address this problem, we propose a bug reports classification model with N-gram IDF, a theoretical exte...

Full description

Saved in:
Bibliographic Details
Main Authors: Pannavat Terdchanakul, Hideaki Hata, Passakorn Phannachitta, Kenichi Matsumoto
Format: Conference Proceeding
Published: 2018
Subjects:
Online Access:https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85040599854&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/46634
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Chiang Mai University
id th-cmuir.6653943832-46634
record_format dspace
spelling th-cmuir.6653943832-466342018-04-25T07:36:02Z Bug or not? Bug Report classification using N-gram IDF Pannavat Terdchanakul Hideaki Hata Passakorn Phannachitta Kenichi Matsumoto Engineering Agricultural and Biological Sciences Arts and Humanities © 2017 IEEE. Previous studies have found that a significant number of bug reports are misclassified between bugs and nonbugs, and that manually classifying bug reports is a time-consuming task. To address this problem, we propose a bug reports classification model with N-gram IDF, a theoretical extension of Inverse Document Frequency (IDF) for handling words and phrases of any length. N-gram IDF enables us to extract key terms of any length from texts, these key terms can be used as the features to classify bug reports. We build classification models with logistic regression and random forest using features from N-gram IDF and topic modeling, which is widely used in various software engineering tasks. With a publicly available dataset, our results show that our N-gram IDF-based models have a superior performance than the topic-based models on all of the evaluated cases. Our models show promising results and have a potential to be extended to other software engineering tasks. 2018-04-25T06:58:41Z 2018-04-25T06:58:41Z 2017-11-02 Conference Proceeding 2-s2.0-85040599854 10.1109/ICSME.2017.14 https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85040599854&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/46634
institution Chiang Mai University
building Chiang Mai University Library
country Thailand
collection CMU Intellectual Repository
topic Engineering
Agricultural and Biological Sciences
Arts and Humanities
spellingShingle Engineering
Agricultural and Biological Sciences
Arts and Humanities
Pannavat Terdchanakul
Hideaki Hata
Passakorn Phannachitta
Kenichi Matsumoto
Bug or not? Bug Report classification using N-gram IDF
description © 2017 IEEE. Previous studies have found that a significant number of bug reports are misclassified between bugs and nonbugs, and that manually classifying bug reports is a time-consuming task. To address this problem, we propose a bug reports classification model with N-gram IDF, a theoretical extension of Inverse Document Frequency (IDF) for handling words and phrases of any length. N-gram IDF enables us to extract key terms of any length from texts, these key terms can be used as the features to classify bug reports. We build classification models with logistic regression and random forest using features from N-gram IDF and topic modeling, which is widely used in various software engineering tasks. With a publicly available dataset, our results show that our N-gram IDF-based models have a superior performance than the topic-based models on all of the evaluated cases. Our models show promising results and have a potential to be extended to other software engineering tasks.
format Conference Proceeding
author Pannavat Terdchanakul
Hideaki Hata
Passakorn Phannachitta
Kenichi Matsumoto
author_facet Pannavat Terdchanakul
Hideaki Hata
Passakorn Phannachitta
Kenichi Matsumoto
author_sort Pannavat Terdchanakul
title Bug or not? Bug Report classification using N-gram IDF
title_short Bug or not? Bug Report classification using N-gram IDF
title_full Bug or not? Bug Report classification using N-gram IDF
title_fullStr Bug or not? Bug Report classification using N-gram IDF
title_full_unstemmed Bug or not? Bug Report classification using N-gram IDF
title_sort bug or not? bug report classification using n-gram idf
publishDate 2018
url https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85040599854&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/46634
_version_ 1681422910834606080