Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search

With software penetrating into all kinds of traditional or emerging industries, there is a great demand on software development. Faced with the fact that there is a limited number of developers, one important way to meet such urgent needs is to significantly improve developers’ productivity. As the...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Chunyang
Other Authors: Liu Yang
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/75873
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-75873
record_format dspace
spelling sg-ntu-dr.10356-758732023-03-04T00:52:40Z Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search Chen, Chunyang Liu Yang School of Computer Science and Engineering Xing Zhenchang DRNTU::Engineering::Computer science and engineering::Software::Software engineering With software penetrating into all kinds of traditional or emerging industries, there is a great demand on software development. Faced with the fact that there is a limited number of developers, one important way to meet such urgent needs is to significantly improve developers’ productivity. As the most popular Q&A site, Stack Overflow has accumulated abundant software development knowledge. Effectively leveraging such a big data can help developers reuse the experience there to further improve their working efficiency. However, the rich yet unstructured large-scale data in Stack Overflow makes it difficult to search due to two reasons. First, there are too many questions and answers within the site, and there may be lingual gap (the same meaning can be written in different languages) between the query and content in Stack Overflow. In addition, the decay of information quality such as misspelling, inconsistency, and abuse of domain-specific abbreviations aggravates the search performance. Second, some higher-order knowledge in Stack Overflow is implicit for searching and it needs certain distillation from existing raw data. In this thesis, I present methods for supporting developers’ information search over Stack Overflow. To overcome the lexical gap and information decay, I also develop an edit recommendation tool to ensure the post quality of Stack Overflow so that posts can be more easily searched by the query. But such explicit information search still requires developers to read, understand and summarize, which is time-consuming. So I propose to shift from the document (information) search to entity (knowledge) search by mining the implicit knowledge from tags in Stack Overflow to render direct answers to developers instead of asking them to read lengthy documents. I first build a basic software-specific knowledge graph including thousands of software-engineering terms and their associations by association rule mining and community detection. Then, I enrich the knowledge graph with more fine-grained relationships i.e., analogy among different third-party libraries. Finally, I combine both semantic and lexical information to infer morphological forms of software terms so that the knowledge graph is more robust for knowledge search. Doctor of Philosophy (SCE) 2018-07-02T01:57:25Z 2018-07-02T01:57:25Z 2018 Thesis Chen, C. (2017). Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/75873 10.32657/10356/75873 en 183 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Software::Software engineering
spellingShingle DRNTU::Engineering::Computer science and engineering::Software::Software engineering
Chen, Chunyang
Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search
description With software penetrating into all kinds of traditional or emerging industries, there is a great demand on software development. Faced with the fact that there is a limited number of developers, one important way to meet such urgent needs is to significantly improve developers’ productivity. As the most popular Q&A site, Stack Overflow has accumulated abundant software development knowledge. Effectively leveraging such a big data can help developers reuse the experience there to further improve their working efficiency. However, the rich yet unstructured large-scale data in Stack Overflow makes it difficult to search due to two reasons. First, there are too many questions and answers within the site, and there may be lingual gap (the same meaning can be written in different languages) between the query and content in Stack Overflow. In addition, the decay of information quality such as misspelling, inconsistency, and abuse of domain-specific abbreviations aggravates the search performance. Second, some higher-order knowledge in Stack Overflow is implicit for searching and it needs certain distillation from existing raw data. In this thesis, I present methods for supporting developers’ information search over Stack Overflow. To overcome the lexical gap and information decay, I also develop an edit recommendation tool to ensure the post quality of Stack Overflow so that posts can be more easily searched by the query. But such explicit information search still requires developers to read, understand and summarize, which is time-consuming. So I propose to shift from the document (information) search to entity (knowledge) search by mining the implicit knowledge from tags in Stack Overflow to render direct answers to developers instead of asking them to read lengthy documents. I first build a basic software-specific knowledge graph including thousands of software-engineering terms and their associations by association rule mining and community detection. Then, I enrich the knowledge graph with more fine-grained relationships i.e., analogy among different third-party libraries. Finally, I combine both semantic and lexical information to infer morphological forms of software terms so that the knowledge graph is more robust for knowledge search.
author2 Liu Yang
author_facet Liu Yang
Chen, Chunyang
format Theses and Dissertations
author Chen, Chunyang
author_sort Chen, Chunyang
title Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search
title_short Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search
title_full Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search
title_fullStr Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search
title_full_unstemmed Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search
title_sort distilling crowd knowledge from software-specific q&a discussions for assisting developers’ knowledge search
publishDate 2018
url http://hdl.handle.net/10356/75873
_version_ 1759857990944423936