Automating intention mining

Developers frequently discuss aspects of the systems they are developing online. The comments they post to discussions form a rich information source about the system. Intention mining, a process introduced by Di Sorbo et al., classifies sentences in developer discussions to enable further analysis....

Full description

Saved in:

Bibliographic Details
Main Authors:	HUANG, Qiao, XIA, Xin, LO, David, MURPHY, Gail C.
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2018
Subjects:	Tuning Data mining Computer bugs Software Linguistics Training Taxonomy Numerical Analysis and Scientific Computing Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/4354 https://ink.library.smu.edu.sg/context/sis_research/article/5357/viewcontent/Automating_Intention_Mining_tse_2018.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5357
record_format	dspace
spelling	sg-smu-ink.sis_research-53572022-10-03T08:34:20Z Automating intention mining HUANG, Qiao XIA, Xin LO, David MURPHY, Gail C. Developers frequently discuss aspects of the systems they are developing online. The comments they post to discussions form a rich information source about the system. Intention mining, a process introduced by Di Sorbo et al., classifies sentences in developer discussions to enable further analysis. As one example of use, intention mining has been used to help build various recommenders for software developers. The technique introduced by Di Sorbo et al. to categorize sentences is based on linguistic patterns derived from two projects. The limited number of data sources used in this earlier work introduces questions about the comprehensiveness of intention categories and whether the linguistic patterns used to identify the categories are generalizable to developer discussion recorded in other kinds of software artifacts (e.g., issue reports). To assess the comprehensiveness of the previously identified intention categories and the generalizability of the linguistic patterns for category identification, we manually created a new dataset, categorizing 5,408 sentences from issue reports of four projects in GitHub. Based on this manual effort, we refined the previous categories. We assess Di Sorbo et al.'s patterns on this dataset, finding that the accuracy rate achieved is low (0.31). To address the deficiencies of Di Sorbo et al.'s patterns, we propose and investigate a convolution neural network (CNN)-based approach to automatically classify sentences into different categories of intentions. Our approach optimizes CNN by integrating batch normalization to accelerate the training speed, and an automatic hyperparameter tuning approach to tune appropriate hyperparameters of CNN. Our approach achieves an accuracy of 0.84 on the new dataset, improving Di Sorbo et al.'s approach by 171%. We also apply our approach to improve an automated software engineering task, in which we use our proposed approach to rectify misclassified issue reports, thus reducing the bias introduced by such data to other studies. A case study on four open source projects with 2,076 issue reports shows that our approach achieves an average AUC score of 0.687, which improves other baselines by at least 16%. 2018-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4354 info:doi/10.1109/TSE.2018.2876340 https://ink.library.smu.edu.sg/context/sis_research/article/5357/viewcontent/Automating_Intention_Mining_tse_2018.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Tuning Data mining Computer bugs Software Linguistics Training Taxonomy Numerical Analysis and Scientific Computing Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Tuning Data mining Computer bugs Software Linguistics Training Taxonomy Numerical Analysis and Scientific Computing Software Engineering
spellingShingle	Tuning Data mining Computer bugs Software Linguistics Training Taxonomy Numerical Analysis and Scientific Computing Software Engineering HUANG, Qiao XIA, Xin LO, David MURPHY, Gail C. Automating intention mining
description	Developers frequently discuss aspects of the systems they are developing online. The comments they post to discussions form a rich information source about the system. Intention mining, a process introduced by Di Sorbo et al., classifies sentences in developer discussions to enable further analysis. As one example of use, intention mining has been used to help build various recommenders for software developers. The technique introduced by Di Sorbo et al. to categorize sentences is based on linguistic patterns derived from two projects. The limited number of data sources used in this earlier work introduces questions about the comprehensiveness of intention categories and whether the linguistic patterns used to identify the categories are generalizable to developer discussion recorded in other kinds of software artifacts (e.g., issue reports). To assess the comprehensiveness of the previously identified intention categories and the generalizability of the linguistic patterns for category identification, we manually created a new dataset, categorizing 5,408 sentences from issue reports of four projects in GitHub. Based on this manual effort, we refined the previous categories. We assess Di Sorbo et al.'s patterns on this dataset, finding that the accuracy rate achieved is low (0.31). To address the deficiencies of Di Sorbo et al.'s patterns, we propose and investigate a convolution neural network (CNN)-based approach to automatically classify sentences into different categories of intentions. Our approach optimizes CNN by integrating batch normalization to accelerate the training speed, and an automatic hyperparameter tuning approach to tune appropriate hyperparameters of CNN. Our approach achieves an accuracy of 0.84 on the new dataset, improving Di Sorbo et al.'s approach by 171%. We also apply our approach to improve an automated software engineering task, in which we use our proposed approach to rectify misclassified issue reports, thus reducing the bias introduced by such data to other studies. A case study on four open source projects with 2,076 issue reports shows that our approach achieves an average AUC score of 0.687, which improves other baselines by at least 16%.
format	text
author	HUANG, Qiao XIA, Xin LO, David MURPHY, Gail C.
author_facet	HUANG, Qiao XIA, Xin LO, David MURPHY, Gail C.
author_sort	HUANG, Qiao
title	Automating intention mining
title_short	Automating intention mining
title_full	Automating intention mining
title_fullStr	Automating intention mining
title_full_unstemmed	Automating intention mining
title_sort	automating intention mining
publisher	Institutional Knowledge at Singapore Management University
publishDate	2018
url	https://ink.library.smu.edu.sg/sis_research/4354 https://ink.library.smu.edu.sg/context/sis_research/article/5357/viewcontent/Automating_Intention_Mining_tse_2018.pdf
_version_	1770574684783378432

Automating intention mining

Similar Items