What do users ask in open-source AI repositories? An empirical study of GitHub issues
Artificial Intelligence (AI) systems, which benefit from the availability of large-scale datasets and increasing computational power, have become effective solutions to various critical tasks, such as natural language understanding, speech recognition, and image processing. The advancement of these...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2023
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8571 https://ink.library.smu.edu.sg/context/sis_research/article/9574/viewcontent/what_do_users.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-9574 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-95742024-01-25T08:59:52Z What do users ask in open-source AI repositories? An empirical study of GitHub issues YANG, Zhou WANG, Chenyu SHI, Jieke HOANG, Thong KOCHHAR, Pavneet Singh LU, Qinghua XING, Zhenchang David LO, Artificial Intelligence (AI) systems, which benefit from the availability of large-scale datasets and increasing computational power, have become effective solutions to various critical tasks, such as natural language understanding, speech recognition, and image processing. The advancement of these AI systems is inseparable from open-source software (OSS). Specifically, many benchmarks, implementations, and frameworks for constructing AI systems are made open source and accessible to the public, allowing researchers and practitioners to reproduce the reported results and broaden the application of AI systems. The development of AI systems follows a data-driven paradigm and is sensitive to hyperparameter settings and data separation. Developers may encounter unique problems when employing open-source AI repositories.This paper presents an empirical study that investigates the issues in the repositories of open-source AI repositories to assist developers in understanding problems during the process of employing AI systems. We collect 576 repositories from the PapersWithCode platform. Among these repositories, we find 24,953 issues by utilizing GitHub REST APIs. Our empirical study includes three phases. First, we manually analyze these issues to categorize the problems that developers are likely to encounter in open-source AI repositories. Specifically, we provide a taxonomy of 13 categories related to AI systems. The two most common issues are runtime errors (23.18%) and unclear instructions (19.53%). Second, we see that 67.5% of issues are closed. We also find that half of these issues resolve within four days. Moreover, issue management features, e.g., label and assign, are not widely adopted in open-source AI repositories. In particular, only 7.81% and 5.9% of repositories label issues and assign these issues to assignees, respectively. Finally, we empirically show that employing GitHub issue management features and writing issues with detailed descriptions facilitate the resolution of issues. Based on our findings, we make recommendations for developers to help better manage the issues of open-source AI repositories and improve their quality. 2023-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8571 info:doi/10.1109/MSR59073.2023.00024 https://ink.library.smu.edu.sg/context/sis_research/article/9574/viewcontent/what_do_users.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Artificial intelligence repository Artificial intelligence systems Best development practice Development practices Empirical studies Mining software Mining software repository Open-source Open-source software Software repositories Artificial Intelligence and Robotics Databases and Information Systems Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Artificial intelligence repository Artificial intelligence systems Best development practice Development practices Empirical studies Mining software Mining software repository Open-source Open-source software Software repositories Artificial Intelligence and Robotics Databases and Information Systems Software Engineering |
spellingShingle |
Artificial intelligence repository Artificial intelligence systems Best development practice Development practices Empirical studies Mining software Mining software repository Open-source Open-source software Software repositories Artificial Intelligence and Robotics Databases and Information Systems Software Engineering YANG, Zhou WANG, Chenyu SHI, Jieke HOANG, Thong KOCHHAR, Pavneet Singh LU, Qinghua XING, Zhenchang David LO, What do users ask in open-source AI repositories? An empirical study of GitHub issues |
description |
Artificial Intelligence (AI) systems, which benefit from the availability of large-scale datasets and increasing computational power, have become effective solutions to various critical tasks, such as natural language understanding, speech recognition, and image processing. The advancement of these AI systems is inseparable from open-source software (OSS). Specifically, many benchmarks, implementations, and frameworks for constructing AI systems are made open source and accessible to the public, allowing researchers and practitioners to reproduce the reported results and broaden the application of AI systems. The development of AI systems follows a data-driven paradigm and is sensitive to hyperparameter settings and data separation. Developers may encounter unique problems when employing open-source AI repositories.This paper presents an empirical study that investigates the issues in the repositories of open-source AI repositories to assist developers in understanding problems during the process of employing AI systems. We collect 576 repositories from the PapersWithCode platform. Among these repositories, we find 24,953 issues by utilizing GitHub REST APIs. Our empirical study includes three phases. First, we manually analyze these issues to categorize the problems that developers are likely to encounter in open-source AI repositories. Specifically, we provide a taxonomy of 13 categories related to AI systems. The two most common issues are runtime errors (23.18%) and unclear instructions (19.53%). Second, we see that 67.5% of issues are closed. We also find that half of these issues resolve within four days. Moreover, issue management features, e.g., label and assign, are not widely adopted in open-source AI repositories. In particular, only 7.81% and 5.9% of repositories label issues and assign these issues to assignees, respectively. Finally, we empirically show that employing GitHub issue management features and writing issues with detailed descriptions facilitate the resolution of issues. Based on our findings, we make recommendations for developers to help better manage the issues of open-source AI repositories and improve their quality. |
format |
text |
author |
YANG, Zhou WANG, Chenyu SHI, Jieke HOANG, Thong KOCHHAR, Pavneet Singh LU, Qinghua XING, Zhenchang David LO, |
author_facet |
YANG, Zhou WANG, Chenyu SHI, Jieke HOANG, Thong KOCHHAR, Pavneet Singh LU, Qinghua XING, Zhenchang David LO, |
author_sort |
YANG, Zhou |
title |
What do users ask in open-source AI repositories? An empirical study of GitHub issues |
title_short |
What do users ask in open-source AI repositories? An empirical study of GitHub issues |
title_full |
What do users ask in open-source AI repositories? An empirical study of GitHub issues |
title_fullStr |
What do users ask in open-source AI repositories? An empirical study of GitHub issues |
title_full_unstemmed |
What do users ask in open-source AI repositories? An empirical study of GitHub issues |
title_sort |
what do users ask in open-source ai repositories? an empirical study of github issues |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2023 |
url |
https://ink.library.smu.edu.sg/sis_research/8571 https://ink.library.smu.edu.sg/context/sis_research/article/9574/viewcontent/what_do_users.pdf |
_version_ |
1789483278278852608 |