Automating developer chat mining
Online chatrooms are gaining popularity as a communication channel between widely distributed developers of Open Source Software (OSS) projects. Most discussion threads in chatrooms follow a Q&A format, with some developers (askers) raising an initial question and others (respondents) joining in...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2021
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/6809 https://ink.library.smu.edu.sg/context/sis_research/article/7812/viewcontent/Automating_Developer_Chat_Mining.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-7812 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-78122023-04-25T05:14:11Z Automating developer chat mining PAN, Shengyi BAO, Lingfeng REN, Xiaoxue XIA, Xin LO, David LI, Shanping Online chatrooms are gaining popularity as a communication channel between widely distributed developers of Open Source Software (OSS) projects. Most discussion threads in chatrooms follow a Q&A format, with some developers (askers) raising an initial question and others (respondents) joining in to provide answers. These discussion threads are embedded with rich information that can satisfy the diverse needs of various OSS stakeholders. However, retrieving information from threads is challenging as it requires a thread-level analysis to understand the context. Moreover, the chat data is transient and unstructured, consisting of entangled informal conversations. In this paper, we address this challenge by identifying the information types available in developer chats and further introducing an automated mining technique. Through manual examination of chat data from three chatrooms on Gitter, using card sorting, we build a thread-level taxonomy with nine information categories and create a labeled dataset with 2,959 threads. We propose a classification approach (named F2CHAT) to structure the vast amount of threads based on the information type automatically, helping stakeholders quickly acquire their desired information. F2CHAT effectively combines handcrafted non-textual features with deep textual features extracted by neural models. Specifically, it has two stages with the first one leveraging the siamese architecture to pretrain the textual feature encoder, and the second one facilitating an in-depth fusion of two types of features. Evaluation results suggest that our approach achieves an average F1-score of 0.628, which improves the baseline by 57%. Experiments also verify the effectiveness of our identified non-textual features under both intra-project and cross-project validations 2021-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6809 info:doi/10.1109/ASE51524.2021.9678923 https://ink.library.smu.edu.sg/context/sis_research/article/7812/viewcontent/Automating_Developer_Chat_Mining.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Developer Chatrooms Information Mining Deep Learning Gitter Databases and Information Systems Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Developer Chatrooms Information Mining Deep Learning Gitter Databases and Information Systems Software Engineering |
spellingShingle |
Developer Chatrooms Information Mining Deep Learning Gitter Databases and Information Systems Software Engineering PAN, Shengyi BAO, Lingfeng REN, Xiaoxue XIA, Xin LO, David LI, Shanping Automating developer chat mining |
description |
Online chatrooms are gaining popularity as a communication channel between widely distributed developers of Open Source Software (OSS) projects. Most discussion threads in chatrooms follow a Q&A format, with some developers (askers) raising an initial question and others (respondents) joining in to provide answers. These discussion threads are embedded with rich information that can satisfy the diverse needs of various OSS stakeholders. However, retrieving information from threads is challenging as it requires a thread-level analysis to understand the context. Moreover, the chat data is transient and unstructured, consisting of entangled informal conversations. In this paper, we address this challenge by identifying the information types available in developer chats and further introducing an automated mining technique. Through manual examination of chat data from three chatrooms on Gitter, using card sorting, we build a thread-level taxonomy with nine information categories and create a labeled dataset with 2,959 threads. We propose a classification approach (named F2CHAT) to structure the vast amount of threads based on the information type automatically, helping stakeholders quickly acquire their desired information. F2CHAT effectively combines handcrafted non-textual features with deep textual features extracted by neural models. Specifically, it has two stages with the first one leveraging the siamese architecture to pretrain the textual feature encoder, and the second one facilitating an in-depth fusion of two types of features. Evaluation results suggest that our approach achieves an average F1-score of 0.628, which improves the baseline by 57%. Experiments also verify the effectiveness of our identified non-textual features under both intra-project and cross-project validations |
format |
text |
author |
PAN, Shengyi BAO, Lingfeng REN, Xiaoxue XIA, Xin LO, David LI, Shanping |
author_facet |
PAN, Shengyi BAO, Lingfeng REN, Xiaoxue XIA, Xin LO, David LI, Shanping |
author_sort |
PAN, Shengyi |
title |
Automating developer chat mining |
title_short |
Automating developer chat mining |
title_full |
Automating developer chat mining |
title_fullStr |
Automating developer chat mining |
title_full_unstemmed |
Automating developer chat mining |
title_sort |
automating developer chat mining |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2021 |
url |
https://ink.library.smu.edu.sg/sis_research/6809 https://ink.library.smu.edu.sg/context/sis_research/article/7812/viewcontent/Automating_Developer_Chat_Mining.pdf |
_version_ |
1770576072989999104 |