What do users ask in open-source AI repositories? An empirical study of GitHub issues

Artificial Intelligence (AI) systems, which benefit from the availability of large-scale datasets and increasing computational power, have become effective solutions to various critical tasks, such as natural language understanding, speech recognition, and image processing. The advancement of these...

Full description

Saved in:
Bibliographic Details
Main Authors: YANG, Zhou, WANG, Chenyu, SHI, Jieke, HOANG, Thong, KOCHHAR, Pavneet Singh, LU, Qinghua, XING, Zhenchang, David LO
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8571
https://ink.library.smu.edu.sg/context/sis_research/article/9574/viewcontent/what_do_users.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9574
record_format dspace
spelling sg-smu-ink.sis_research-95742024-01-25T08:59:52Z What do users ask in open-source AI repositories? An empirical study of GitHub issues YANG, Zhou WANG, Chenyu SHI, Jieke HOANG, Thong KOCHHAR, Pavneet Singh LU, Qinghua XING, Zhenchang David LO, Artificial Intelligence (AI) systems, which benefit from the availability of large-scale datasets and increasing computational power, have become effective solutions to various critical tasks, such as natural language understanding, speech recognition, and image processing. The advancement of these AI systems is inseparable from open-source software (OSS). Specifically, many benchmarks, implementations, and frameworks for constructing AI systems are made open source and accessible to the public, allowing researchers and practitioners to reproduce the reported results and broaden the application of AI systems. The development of AI systems follows a data-driven paradigm and is sensitive to hyperparameter settings and data separation. Developers may encounter unique problems when employing open-source AI repositories.This paper presents an empirical study that investigates the issues in the repositories of open-source AI repositories to assist developers in understanding problems during the process of employing AI systems. We collect 576 repositories from the PapersWithCode platform. Among these repositories, we find 24,953 issues by utilizing GitHub REST APIs. Our empirical study includes three phases. First, we manually analyze these issues to categorize the problems that developers are likely to encounter in open-source AI repositories. Specifically, we provide a taxonomy of 13 categories related to AI systems. The two most common issues are runtime errors (23.18%) and unclear instructions (19.53%). Second, we see that 67.5% of issues are closed. We also find that half of these issues resolve within four days. Moreover, issue management features, e.g., label and assign, are not widely adopted in open-source AI repositories. In particular, only 7.81% and 5.9% of repositories label issues and assign these issues to assignees, respectively. Finally, we empirically show that employing GitHub issue management features and writing issues with detailed descriptions facilitate the resolution of issues. Based on our findings, we make recommendations for developers to help better manage the issues of open-source AI repositories and improve their quality. 2023-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8571 info:doi/10.1109/MSR59073.2023.00024 https://ink.library.smu.edu.sg/context/sis_research/article/9574/viewcontent/what_do_users.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial intelligence repository Artificial intelligence systems Best development practice Development practices Empirical studies Mining software Mining software repository Open-source Open-source software Software repositories Artificial Intelligence and Robotics Databases and Information Systems Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Artificial intelligence repository
Artificial intelligence systems
Best development practice
Development practices
Empirical studies
Mining software
Mining software repository
Open-source
Open-source software
Software repositories
Artificial Intelligence and Robotics
Databases and Information Systems
Software Engineering
spellingShingle Artificial intelligence repository
Artificial intelligence systems
Best development practice
Development practices
Empirical studies
Mining software
Mining software repository
Open-source
Open-source software
Software repositories
Artificial Intelligence and Robotics
Databases and Information Systems
Software Engineering
YANG, Zhou
WANG, Chenyu
SHI, Jieke
HOANG, Thong
KOCHHAR, Pavneet Singh
LU, Qinghua
XING, Zhenchang
David LO,
What do users ask in open-source AI repositories? An empirical study of GitHub issues
description Artificial Intelligence (AI) systems, which benefit from the availability of large-scale datasets and increasing computational power, have become effective solutions to various critical tasks, such as natural language understanding, speech recognition, and image processing. The advancement of these AI systems is inseparable from open-source software (OSS). Specifically, many benchmarks, implementations, and frameworks for constructing AI systems are made open source and accessible to the public, allowing researchers and practitioners to reproduce the reported results and broaden the application of AI systems. The development of AI systems follows a data-driven paradigm and is sensitive to hyperparameter settings and data separation. Developers may encounter unique problems when employing open-source AI repositories.This paper presents an empirical study that investigates the issues in the repositories of open-source AI repositories to assist developers in understanding problems during the process of employing AI systems. We collect 576 repositories from the PapersWithCode platform. Among these repositories, we find 24,953 issues by utilizing GitHub REST APIs. Our empirical study includes three phases. First, we manually analyze these issues to categorize the problems that developers are likely to encounter in open-source AI repositories. Specifically, we provide a taxonomy of 13 categories related to AI systems. The two most common issues are runtime errors (23.18%) and unclear instructions (19.53%). Second, we see that 67.5% of issues are closed. We also find that half of these issues resolve within four days. Moreover, issue management features, e.g., label and assign, are not widely adopted in open-source AI repositories. In particular, only 7.81% and 5.9% of repositories label issues and assign these issues to assignees, respectively. Finally, we empirically show that employing GitHub issue management features and writing issues with detailed descriptions facilitate the resolution of issues. Based on our findings, we make recommendations for developers to help better manage the issues of open-source AI repositories and improve their quality.
format text
author YANG, Zhou
WANG, Chenyu
SHI, Jieke
HOANG, Thong
KOCHHAR, Pavneet Singh
LU, Qinghua
XING, Zhenchang
David LO,
author_facet YANG, Zhou
WANG, Chenyu
SHI, Jieke
HOANG, Thong
KOCHHAR, Pavneet Singh
LU, Qinghua
XING, Zhenchang
David LO,
author_sort YANG, Zhou
title What do users ask in open-source AI repositories? An empirical study of GitHub issues
title_short What do users ask in open-source AI repositories? An empirical study of GitHub issues
title_full What do users ask in open-source AI repositories? An empirical study of GitHub issues
title_fullStr What do users ask in open-source AI repositories? An empirical study of GitHub issues
title_full_unstemmed What do users ask in open-source AI repositories? An empirical study of GitHub issues
title_sort what do users ask in open-source ai repositories? an empirical study of github issues
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/8571
https://ink.library.smu.edu.sg/context/sis_research/article/9574/viewcontent/what_do_users.pdf
_version_ 1789483278278852608