Characterizing and identifying reverted commits

In practice, a popular and coarse-grained approach for recovering from a problematic commit is to revert it (i.e., undoing the change). However, reverted commits could induce some issues for software development, such as impeding the development progress and increasing the difficulty for maintenance...

Full description

Saved in:
Bibliographic Details
Main Authors: YAN, Meng, XIA, Xin, LO, David, HASSAN, Ahmed E., LI, Shanping
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2019
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4357
https://ink.library.smu.edu.sg/context/sis_research/article/5360/viewcontent/Reverted_commits_emse_2019_afv.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5360
record_format dspace
spelling sg-smu-ink.sis_research-53602019-12-30T03:02:41Z Characterizing and identifying reverted commits YAN, Meng XIA, Xin LO, David HASSAN, Ahmed E. LI, Shanping In practice, a popular and coarse-grained approach for recovering from a problematic commit is to revert it (i.e., undoing the change). However, reverted commits could induce some issues for software development, such as impeding the development progress and increasing the difficulty for maintenance. In order to mitigate these issues, we set out to explore the following central question: can we characterize and identify which commits will be reverted? In this paper, we characterize commits using 27 commit features and build an identification model to identify commits that will be reverted. We first identify reverted commits by analyzing commit messages and comparing the changed content, and extract 27 commit features that can be divided into three dimensions, namely change, developer and message, respectively. Then, we build an identification model (e.g., random forest) based on the extracted features. To evaluate the effectiveness of our proposed model, we perform an empirical study on ten open source projects including a total of 125,241 commits. Our experimental results show that our model outperforms two baselines in terms of AUC-ROC and cost-effectiveness (i.e., percentage of detected reverted commits when inspecting 20% of total changed LOC). In terms of the average performance across the ten studied projects, our model achieves an AUC-ROC of 0.756 and a cost-effectiveness of 0.746, significantly improving the baselines by substantial margins. In addition, we found that “developer” is the most discriminative dimension among the three dimensions of features for the identification of reverted commits. However, using all the three dimensions of commit features leads to better performance. 2019-03-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4357 info:doi/10.1007/s10664-019-09688-8 https://ink.library.smu.edu.sg/context/sis_research/article/5360/viewcontent/Reverted_commits_emse_2019_afv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Empirical study Identification model Feature engineering Reverted commits Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Empirical study
Identification model
Feature engineering
Reverted commits
Software Engineering
spellingShingle Empirical study
Identification model
Feature engineering
Reverted commits
Software Engineering
YAN, Meng
XIA, Xin
LO, David
HASSAN, Ahmed E.
LI, Shanping
Characterizing and identifying reverted commits
description In practice, a popular and coarse-grained approach for recovering from a problematic commit is to revert it (i.e., undoing the change). However, reverted commits could induce some issues for software development, such as impeding the development progress and increasing the difficulty for maintenance. In order to mitigate these issues, we set out to explore the following central question: can we characterize and identify which commits will be reverted? In this paper, we characterize commits using 27 commit features and build an identification model to identify commits that will be reverted. We first identify reverted commits by analyzing commit messages and comparing the changed content, and extract 27 commit features that can be divided into three dimensions, namely change, developer and message, respectively. Then, we build an identification model (e.g., random forest) based on the extracted features. To evaluate the effectiveness of our proposed model, we perform an empirical study on ten open source projects including a total of 125,241 commits. Our experimental results show that our model outperforms two baselines in terms of AUC-ROC and cost-effectiveness (i.e., percentage of detected reverted commits when inspecting 20% of total changed LOC). In terms of the average performance across the ten studied projects, our model achieves an AUC-ROC of 0.756 and a cost-effectiveness of 0.746, significantly improving the baselines by substantial margins. In addition, we found that “developer” is the most discriminative dimension among the three dimensions of features for the identification of reverted commits. However, using all the three dimensions of commit features leads to better performance.
format text
author YAN, Meng
XIA, Xin
LO, David
HASSAN, Ahmed E.
LI, Shanping
author_facet YAN, Meng
XIA, Xin
LO, David
HASSAN, Ahmed E.
LI, Shanping
author_sort YAN, Meng
title Characterizing and identifying reverted commits
title_short Characterizing and identifying reverted commits
title_full Characterizing and identifying reverted commits
title_fullStr Characterizing and identifying reverted commits
title_full_unstemmed Characterizing and identifying reverted commits
title_sort characterizing and identifying reverted commits
publisher Institutional Knowledge at Singapore Management University
publishDate 2019
url https://ink.library.smu.edu.sg/sis_research/4357
https://ink.library.smu.edu.sg/context/sis_research/article/5360/viewcontent/Reverted_commits_emse_2019_afv.pdf
_version_ 1770574664080293888