What is the vocabulary of flaky tests?

Flaky tests are tests whose outcomes are non-deterministic. Despite the recent research activity on this topic, no effort has been made on understanding the vocabulary of flaky tests. This work proposes to automatically classify tests as flaky or not based on their vocabulary. Static classification...

Full description

Saved in:
Bibliographic Details
Main Authors: PINTO, Gustavo, MIRANDA, Breno, DISSANAYAKE, Supun, D'AMORIM, Marcelo, TREUDE, Christoph, BERTOLINO, Antonia
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2020
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8809
https://ink.library.smu.edu.sg/context/sis_research/article/9812/viewcontent/msr20a.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9812
record_format dspace
spelling sg-smu-ink.sis_research-98122024-05-30T07:39:20Z What is the vocabulary of flaky tests? PINTO, Gustavo MIRANDA, Breno DISSANAYAKE, Supun D'AMORIM, Marcelo TREUDE, Christoph BERTOLINO, Antonia Flaky tests are tests whose outcomes are non-deterministic. Despite the recent research activity on this topic, no effort has been made on understanding the vocabulary of flaky tests. This work proposes to automatically classify tests as flaky or not based on their vocabulary. Static classification of flaky tests is important, for example, to detect the introduction of flaky tests and to search for flaky tests after they are introduced in regression test suites. We evaluated performance of various machine learning algorithms to solve this problem. We constructed a data set of flaky and non-flaky tests by running every test case, in a set of 64k tests, 100 times (6.4 million test executions). We then used machine learning techniques on the resulting data set to predict which tests are flaky from their source code. Based on features, such as counting stemmed tokens extracted from source code identifiers, we achieved an F-measure of 0.95 for the identification of flaky tests. The best prediction performance was obtained when using Random Forest and Support Vector Machines. In terms of the code identifiers that are most strongly associated with test flakiness, we noted that job, action, and services are commonly associated with flaky tests. Overall, our results provides initial yet strong evidence that static detection of flaky tests is effective. 2020-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8809 info:doi/10.1145/3379597.3387482 https://ink.library.smu.edu.sg/context/sis_research/article/9812/viewcontent/msr20a.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Regression testing Test flakiness Text classification Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Regression testing
Test flakiness
Text classification
Software Engineering
spellingShingle Regression testing
Test flakiness
Text classification
Software Engineering
PINTO, Gustavo
MIRANDA, Breno
DISSANAYAKE, Supun
D'AMORIM, Marcelo
TREUDE, Christoph
BERTOLINO, Antonia
What is the vocabulary of flaky tests?
description Flaky tests are tests whose outcomes are non-deterministic. Despite the recent research activity on this topic, no effort has been made on understanding the vocabulary of flaky tests. This work proposes to automatically classify tests as flaky or not based on their vocabulary. Static classification of flaky tests is important, for example, to detect the introduction of flaky tests and to search for flaky tests after they are introduced in regression test suites. We evaluated performance of various machine learning algorithms to solve this problem. We constructed a data set of flaky and non-flaky tests by running every test case, in a set of 64k tests, 100 times (6.4 million test executions). We then used machine learning techniques on the resulting data set to predict which tests are flaky from their source code. Based on features, such as counting stemmed tokens extracted from source code identifiers, we achieved an F-measure of 0.95 for the identification of flaky tests. The best prediction performance was obtained when using Random Forest and Support Vector Machines. In terms of the code identifiers that are most strongly associated with test flakiness, we noted that job, action, and services are commonly associated with flaky tests. Overall, our results provides initial yet strong evidence that static detection of flaky tests is effective.
format text
author PINTO, Gustavo
MIRANDA, Breno
DISSANAYAKE, Supun
D'AMORIM, Marcelo
TREUDE, Christoph
BERTOLINO, Antonia
author_facet PINTO, Gustavo
MIRANDA, Breno
DISSANAYAKE, Supun
D'AMORIM, Marcelo
TREUDE, Christoph
BERTOLINO, Antonia
author_sort PINTO, Gustavo
title What is the vocabulary of flaky tests?
title_short What is the vocabulary of flaky tests?
title_full What is the vocabulary of flaky tests?
title_fullStr What is the vocabulary of flaky tests?
title_full_unstemmed What is the vocabulary of flaky tests?
title_sort what is the vocabulary of flaky tests?
publisher Institutional Knowledge at Singapore Management University
publishDate 2020
url https://ink.library.smu.edu.sg/sis_research/8809
https://ink.library.smu.edu.sg/context/sis_research/article/9812/viewcontent/msr20a.pdf
_version_ 1814047536292298752