Learning to interpret knowledge from software Q&A sites

Nowadays, software question and answer (SQA) data has become a treasure for software engineering as it contains a huge volume of programming knowledge. That knowledge can be interpreted in many different ways to support various software activities, such as code recommendation, program repair, and so...

Full description

Saved in:
Bibliographic Details
Main Author: XU, Bowen
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll/377
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1375&context=etd_coll
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.etd_coll-1375
record_format dspace
spelling sg-smu-ink.etd_coll-13752022-03-01T01:41:13Z Learning to interpret knowledge from software Q&A sites XU, Bowen Nowadays, software question and answer (SQA) data has become a treasure for software engineering as it contains a huge volume of programming knowledge. That knowledge can be interpreted in many different ways to support various software activities, such as code recommendation, program repair, and so on. In this dissertation, we interpret SQA data by addressing three novel research problems. The first research problem is about linkable knowledge unit prediction. In this problem, a question and its answers within a post in Stack Overflow are considered as a knowledge unit (KU). KUs often contain semantically relevant knowledge, and thus linkable for different purposes. Being able to classify different classes of linkable knowledge units would support more targeted information needs when users search or explore the linkable knowledge. Compare with the approaches proposed in prior works, we design a relatively simpler but more effective machine learning model to address the problem. Moreover, we discover the limitation of the dataset used in the previous works and construct a new one with a larger size and higher diversity. Our experimental result shows that our model outperforms the state-of-the-art approaches significantly. The second research problem is about distributed representation for Stack Overflow posts. In this dissertation, we propose a specialized deep learning architecture Post2Vec which extracts distributed representations of Stack Overflow posts. To evaluate Post2Vec, we first investigate its end-to-end effectiveness in tag recommendation task. We observe that Post2Vec achieves significant improvement in terms of F1-score@5 at a lower computational cost. Moreover, to evaluate the value of representations learned by Post2Vec, we use them for three other tasks, i.e., relatedness prediction, post classification, and API recommendation. We demonstrate that the representations can be used to boost the effectiveness of state-of-the-art solutions for the three tasks by substantial margins. The third research problem is about answer summary generation for technical questions. We formulate the task as a query-focused multi-answer-posts summarization task for a given technical question. We conduct user studies to evaluate the quality of the answer summaries generated by our approach. The user study results demonstrate those answer summaries generated by AnswerBot are relevant, useful, and diverse. 2021-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/377 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1375&context=etd_coll http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University Software Question and Answer Machine Learning Mining Software Repository Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Software Question and Answer
Machine Learning
Mining Software Repository
Software Engineering
spellingShingle Software Question and Answer
Machine Learning
Mining Software Repository
Software Engineering
XU, Bowen
Learning to interpret knowledge from software Q&A sites
description Nowadays, software question and answer (SQA) data has become a treasure for software engineering as it contains a huge volume of programming knowledge. That knowledge can be interpreted in many different ways to support various software activities, such as code recommendation, program repair, and so on. In this dissertation, we interpret SQA data by addressing three novel research problems. The first research problem is about linkable knowledge unit prediction. In this problem, a question and its answers within a post in Stack Overflow are considered as a knowledge unit (KU). KUs often contain semantically relevant knowledge, and thus linkable for different purposes. Being able to classify different classes of linkable knowledge units would support more targeted information needs when users search or explore the linkable knowledge. Compare with the approaches proposed in prior works, we design a relatively simpler but more effective machine learning model to address the problem. Moreover, we discover the limitation of the dataset used in the previous works and construct a new one with a larger size and higher diversity. Our experimental result shows that our model outperforms the state-of-the-art approaches significantly. The second research problem is about distributed representation for Stack Overflow posts. In this dissertation, we propose a specialized deep learning architecture Post2Vec which extracts distributed representations of Stack Overflow posts. To evaluate Post2Vec, we first investigate its end-to-end effectiveness in tag recommendation task. We observe that Post2Vec achieves significant improvement in terms of F1-score@5 at a lower computational cost. Moreover, to evaluate the value of representations learned by Post2Vec, we use them for three other tasks, i.e., relatedness prediction, post classification, and API recommendation. We demonstrate that the representations can be used to boost the effectiveness of state-of-the-art solutions for the three tasks by substantial margins. The third research problem is about answer summary generation for technical questions. We formulate the task as a query-focused multi-answer-posts summarization task for a given technical question. We conduct user studies to evaluate the quality of the answer summaries generated by our approach. The user study results demonstrate those answer summaries generated by AnswerBot are relevant, useful, and diverse.
format text
author XU, Bowen
author_facet XU, Bowen
author_sort XU, Bowen
title Learning to interpret knowledge from software Q&A sites
title_short Learning to interpret knowledge from software Q&A sites
title_full Learning to interpret knowledge from software Q&A sites
title_fullStr Learning to interpret knowledge from software Q&A sites
title_full_unstemmed Learning to interpret knowledge from software Q&A sites
title_sort learning to interpret knowledge from software q&a sites
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/etd_coll/377
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1375&context=etd_coll
_version_ 1745575013642141696