Automating developer chat mining

Online chatrooms are gaining popularity as a communication channel between widely distributed developers of Open Source Software (OSS) projects. Most discussion threads in chatrooms follow a Q&A format, with some developers (askers) raising an initial question and others (respondents) joining in...

Full description

Saved in:
Bibliographic Details
Main Authors: PAN, Shengyi, BAO, Lingfeng, REN, Xiaoxue, XIA, Xin, LO, David, LI, Shanping
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6809
https://ink.library.smu.edu.sg/context/sis_research/article/7812/viewcontent/Automating_Developer_Chat_Mining.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-7812
record_format dspace
spelling sg-smu-ink.sis_research-78122023-04-25T05:14:11Z Automating developer chat mining PAN, Shengyi BAO, Lingfeng REN, Xiaoxue XIA, Xin LO, David LI, Shanping Online chatrooms are gaining popularity as a communication channel between widely distributed developers of Open Source Software (OSS) projects. Most discussion threads in chatrooms follow a Q&A format, with some developers (askers) raising an initial question and others (respondents) joining in to provide answers. These discussion threads are embedded with rich information that can satisfy the diverse needs of various OSS stakeholders. However, retrieving information from threads is challenging as it requires a thread-level analysis to understand the context. Moreover, the chat data is transient and unstructured, consisting of entangled informal conversations. In this paper, we address this challenge by identifying the information types available in developer chats and further introducing an automated mining technique. Through manual examination of chat data from three chatrooms on Gitter, using card sorting, we build a thread-level taxonomy with nine information categories and create a labeled dataset with 2,959 threads. We propose a classification approach (named F2CHAT) to structure the vast amount of threads based on the information type automatically, helping stakeholders quickly acquire their desired information. F2CHAT effectively combines handcrafted non-textual features with deep textual features extracted by neural models. Specifically, it has two stages with the first one leveraging the siamese architecture to pretrain the textual feature encoder, and the second one facilitating an in-depth fusion of two types of features. Evaluation results suggest that our approach achieves an average F1-score of 0.628, which improves the baseline by 57%. Experiments also verify the effectiveness of our identified non-textual features under both intra-project and cross-project validations 2021-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6809 info:doi/10.1109/ASE51524.2021.9678923 https://ink.library.smu.edu.sg/context/sis_research/article/7812/viewcontent/Automating_Developer_Chat_Mining.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Developer Chatrooms Information Mining Deep Learning Gitter Databases and Information Systems Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Developer Chatrooms
Information Mining
Deep Learning
Gitter
Databases and Information Systems
Software Engineering
spellingShingle Developer Chatrooms
Information Mining
Deep Learning
Gitter
Databases and Information Systems
Software Engineering
PAN, Shengyi
BAO, Lingfeng
REN, Xiaoxue
XIA, Xin
LO, David
LI, Shanping
Automating developer chat mining
description Online chatrooms are gaining popularity as a communication channel between widely distributed developers of Open Source Software (OSS) projects. Most discussion threads in chatrooms follow a Q&A format, with some developers (askers) raising an initial question and others (respondents) joining in to provide answers. These discussion threads are embedded with rich information that can satisfy the diverse needs of various OSS stakeholders. However, retrieving information from threads is challenging as it requires a thread-level analysis to understand the context. Moreover, the chat data is transient and unstructured, consisting of entangled informal conversations. In this paper, we address this challenge by identifying the information types available in developer chats and further introducing an automated mining technique. Through manual examination of chat data from three chatrooms on Gitter, using card sorting, we build a thread-level taxonomy with nine information categories and create a labeled dataset with 2,959 threads. We propose a classification approach (named F2CHAT) to structure the vast amount of threads based on the information type automatically, helping stakeholders quickly acquire their desired information. F2CHAT effectively combines handcrafted non-textual features with deep textual features extracted by neural models. Specifically, it has two stages with the first one leveraging the siamese architecture to pretrain the textual feature encoder, and the second one facilitating an in-depth fusion of two types of features. Evaluation results suggest that our approach achieves an average F1-score of 0.628, which improves the baseline by 57%. Experiments also verify the effectiveness of our identified non-textual features under both intra-project and cross-project validations
format text
author PAN, Shengyi
BAO, Lingfeng
REN, Xiaoxue
XIA, Xin
LO, David
LI, Shanping
author_facet PAN, Shengyi
BAO, Lingfeng
REN, Xiaoxue
XIA, Xin
LO, David
LI, Shanping
author_sort PAN, Shengyi
title Automating developer chat mining
title_short Automating developer chat mining
title_full Automating developer chat mining
title_fullStr Automating developer chat mining
title_full_unstemmed Automating developer chat mining
title_sort automating developer chat mining
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/sis_research/6809
https://ink.library.smu.edu.sg/context/sis_research/article/7812/viewcontent/Automating_Developer_Chat_Mining.pdf
_version_ 1770576072989999104