Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search

With software penetrating into all kinds of traditional or emerging industries, there is a great demand on software development. Faced with the fact that there is a limited number of developers, one important way to meet such urgent needs is to significantly improve developers’ productivity. As the...

Full description

Saved in:

Bibliographic Details
Main Author:	Chen, Chunyang
Other Authors:	Liu Yang
Format:	Theses and Dissertations
Language:	English
Published:	2018
Subjects:	DRNTU::Engineering::Computer science and engineering::Software::Software engineering
Online Access:	http://hdl.handle.net/10356/75873
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-75873
record_format	dspace
spelling	sg-ntu-dr.10356-758732023-03-04T00:52:40Z Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search Chen, Chunyang Liu Yang School of Computer Science and Engineering Xing Zhenchang DRNTU::Engineering::Computer science and engineering::Software::Software engineering With software penetrating into all kinds of traditional or emerging industries, there is a great demand on software development. Faced with the fact that there is a limited number of developers, one important way to meet such urgent needs is to significantly improve developers’ productivity. As the most popular Q&A site, Stack Overflow has accumulated abundant software development knowledge. Effectively leveraging such a big data can help developers reuse the experience there to further improve their working efficiency. However, the rich yet unstructured large-scale data in Stack Overflow makes it difficult to search due to two reasons. First, there are too many questions and answers within the site, and there may be lingual gap (the same meaning can be written in different languages) between the query and content in Stack Overflow. In addition, the decay of information quality such as misspelling, inconsistency, and abuse of domain-specific abbreviations aggravates the search performance. Second, some higher-order knowledge in Stack Overflow is implicit for searching and it needs certain distillation from existing raw data. In this thesis, I present methods for supporting developers’ information search over Stack Overflow. To overcome the lexical gap and information decay, I also develop an edit recommendation tool to ensure the post quality of Stack Overflow so that posts can be more easily searched by the query. But such explicit information search still requires developers to read, understand and summarize, which is time-consuming. So I propose to shift from the document (information) search to entity (knowledge) search by mining the implicit knowledge from tags in Stack Overflow to render direct answers to developers instead of asking them to read lengthy documents. I first build a basic software-specific knowledge graph including thousands of software-engineering terms and their associations by association rule mining and community detection. Then, I enrich the knowledge graph with more fine-grained relationships i.e., analogy among different third-party libraries. Finally, I combine both semantic and lexical information to infer morphological forms of software terms so that the knowledge graph is more robust for knowledge search. Doctor of Philosophy (SCE) 2018-07-02T01:57:25Z 2018-07-02T01:57:25Z 2018 Thesis Chen, C. (2017). Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/75873 10.32657/10356/75873 en 183 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Software::Software engineering
spellingShingle	DRNTU::Engineering::Computer science and engineering::Software::Software engineering Chen, Chunyang Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search
description	With software penetrating into all kinds of traditional or emerging industries, there is a great demand on software development. Faced with the fact that there is a limited number of developers, one important way to meet such urgent needs is to significantly improve developers’ productivity. As the most popular Q&A site, Stack Overflow has accumulated abundant software development knowledge. Effectively leveraging such a big data can help developers reuse the experience there to further improve their working efficiency. However, the rich yet unstructured large-scale data in Stack Overflow makes it difficult to search due to two reasons. First, there are too many questions and answers within the site, and there may be lingual gap (the same meaning can be written in different languages) between the query and content in Stack Overflow. In addition, the decay of information quality such as misspelling, inconsistency, and abuse of domain-specific abbreviations aggravates the search performance. Second, some higher-order knowledge in Stack Overflow is implicit for searching and it needs certain distillation from existing raw data. In this thesis, I present methods for supporting developers’ information search over Stack Overflow. To overcome the lexical gap and information decay, I also develop an edit recommendation tool to ensure the post quality of Stack Overflow so that posts can be more easily searched by the query. But such explicit information search still requires developers to read, understand and summarize, which is time-consuming. So I propose to shift from the document (information) search to entity (knowledge) search by mining the implicit knowledge from tags in Stack Overflow to render direct answers to developers instead of asking them to read lengthy documents. I first build a basic software-specific knowledge graph including thousands of software-engineering terms and their associations by association rule mining and community detection. Then, I enrich the knowledge graph with more fine-grained relationships i.e., analogy among different third-party libraries. Finally, I combine both semantic and lexical information to infer morphological forms of software terms so that the knowledge graph is more robust for knowledge search.
author2	Liu Yang
author_facet	Liu Yang Chen, Chunyang
format	Theses and Dissertations
author	Chen, Chunyang
author_sort	Chen, Chunyang
title	Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search
title_short	Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search
title_full	Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search
title_fullStr	Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search
title_full_unstemmed	Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search
title_sort	distilling crowd knowledge from software-specific q&a discussions for assisting developers’ knowledge search
publishDate	2018
url	http://hdl.handle.net/10356/75873
_version_	1759857990944423936

Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search

Similar Items