An empirical study of bugs in machine learning systems

Many machine learning systems that include various data mining, information retrieval, and natural language processing code and libraries have being used in real world applications. Search engines, internet advertising systems, product recommendation systems are sample users of these algorithm inten...

Full description

Saved in:

Bibliographic Details
Main Authors:	THUNG, Ferdian, WANG, Shaowei, LO, David, JIANG, Lingxiao
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2012
Subjects:	Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/1587 https://ink.library.smu.edu.sg/context/sis_research/article/2586/viewcontent/issre12_EmpiricalStudyBugsMachineLearningSys.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-2586
record_format	dspace
spelling	sg-smu-ink.sis_research-25862017-02-05T07:54:31Z An empirical study of bugs in machine learning systems THUNG, Ferdian WANG, Shaowei LO, David JIANG, Lingxiao Many machine learning systems that include various data mining, information retrieval, and natural language processing code and libraries have being used in real world applications. Search engines, internet advertising systems, product recommendation systems are sample users of these algorithm intensive code and libraries. Machine learning code and toolkits have also been used in many recent studies on software mining and analytics that aim to automate various software engineering tasks. With the increasing number of important applications of machine learning systems, the reliability of such systems is also becoming increasingly important. A necessary step for ensuring reliability of such systems is to understand the features and characteristics of bugs occurred in the systems. A number of studies have investigated bugs and fixes in various software systems, but none focuses on machine learning systems. Machine learning systems are unique due to their algorithm-intensive nature and applications to potentially large-scale data, and thus deserve a special consideration. In this study, we fill the research gap by performing an empirical study on the bugs appeared in machine learning systems. We analyze three systems, namely Apache Mahout, Lucene, and OpenNLP, which are data mining, information retrieval, and natural language processing tools respectively. We look into their bug databases and code repositories, analyze existing bugs and corresponding fixes, and label the bugs into various categories. Our study finds that 22.6% of the bugs belong to algorithm/method category, 15.6% of the bugs belong to the non-functional category, and 13% of the bugs belong to the assignment/initialization category. We also report the relationship between the categories of bugs and their severity, the time and effort needed to fix the bugs, and their impact. We highlight several categories of bugs that deserve attention in future research. 2012-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/1587 info:doi/10.1109/ISSRE.2012.22 https://ink.library.smu.edu.sg/context/sis_research/article/2586/viewcontent/issre12_EmpiricalStudyBugsMachineLearningSys.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Software Engineering
spellingShingle	Software Engineering THUNG, Ferdian WANG, Shaowei LO, David JIANG, Lingxiao An empirical study of bugs in machine learning systems
description	Many machine learning systems that include various data mining, information retrieval, and natural language processing code and libraries have being used in real world applications. Search engines, internet advertising systems, product recommendation systems are sample users of these algorithm intensive code and libraries. Machine learning code and toolkits have also been used in many recent studies on software mining and analytics that aim to automate various software engineering tasks. With the increasing number of important applications of machine learning systems, the reliability of such systems is also becoming increasingly important. A necessary step for ensuring reliability of such systems is to understand the features and characteristics of bugs occurred in the systems. A number of studies have investigated bugs and fixes in various software systems, but none focuses on machine learning systems. Machine learning systems are unique due to their algorithm-intensive nature and applications to potentially large-scale data, and thus deserve a special consideration. In this study, we fill the research gap by performing an empirical study on the bugs appeared in machine learning systems. We analyze three systems, namely Apache Mahout, Lucene, and OpenNLP, which are data mining, information retrieval, and natural language processing tools respectively. We look into their bug databases and code repositories, analyze existing bugs and corresponding fixes, and label the bugs into various categories. Our study finds that 22.6% of the bugs belong to algorithm/method category, 15.6% of the bugs belong to the non-functional category, and 13% of the bugs belong to the assignment/initialization category. We also report the relationship between the categories of bugs and their severity, the time and effort needed to fix the bugs, and their impact. We highlight several categories of bugs that deserve attention in future research.
format	text
author	THUNG, Ferdian WANG, Shaowei LO, David JIANG, Lingxiao
author_facet	THUNG, Ferdian WANG, Shaowei LO, David JIANG, Lingxiao
author_sort	THUNG, Ferdian
title	An empirical study of bugs in machine learning systems
title_short	An empirical study of bugs in machine learning systems
title_full	An empirical study of bugs in machine learning systems
title_fullStr	An empirical study of bugs in machine learning systems
title_full_unstemmed	An empirical study of bugs in machine learning systems
title_sort	empirical study of bugs in machine learning systems
publisher	Institutional Knowledge at Singapore Management University
publishDate	2012
url	https://ink.library.smu.edu.sg/sis_research/1587 https://ink.library.smu.edu.sg/context/sis_research/article/2586/viewcontent/issre12_EmpiricalStudyBugsMachineLearningSys.pdf
_version_	1770571309801013248

An empirical study of bugs in machine learning systems

Similar Items