Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search
With software penetrating into all kinds of traditional or emerging industries, there is a great demand on software development. Faced with the fact that there is a limited number of developers, one important way to meet such urgent needs is to significantly improve developers’ productivity. As the...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/75873 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-75873 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-758732023-03-04T00:52:40Z Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search Chen, Chunyang Liu Yang School of Computer Science and Engineering Xing Zhenchang DRNTU::Engineering::Computer science and engineering::Software::Software engineering With software penetrating into all kinds of traditional or emerging industries, there is a great demand on software development. Faced with the fact that there is a limited number of developers, one important way to meet such urgent needs is to significantly improve developers’ productivity. As the most popular Q&A site, Stack Overflow has accumulated abundant software development knowledge. Effectively leveraging such a big data can help developers reuse the experience there to further improve their working efficiency. However, the rich yet unstructured large-scale data in Stack Overflow makes it difficult to search due to two reasons. First, there are too many questions and answers within the site, and there may be lingual gap (the same meaning can be written in different languages) between the query and content in Stack Overflow. In addition, the decay of information quality such as misspelling, inconsistency, and abuse of domain-specific abbreviations aggravates the search performance. Second, some higher-order knowledge in Stack Overflow is implicit for searching and it needs certain distillation from existing raw data. In this thesis, I present methods for supporting developers’ information search over Stack Overflow. To overcome the lexical gap and information decay, I also develop an edit recommendation tool to ensure the post quality of Stack Overflow so that posts can be more easily searched by the query. But such explicit information search still requires developers to read, understand and summarize, which is time-consuming. So I propose to shift from the document (information) search to entity (knowledge) search by mining the implicit knowledge from tags in Stack Overflow to render direct answers to developers instead of asking them to read lengthy documents. I first build a basic software-specific knowledge graph including thousands of software-engineering terms and their associations by association rule mining and community detection. Then, I enrich the knowledge graph with more fine-grained relationships i.e., analogy among different third-party libraries. Finally, I combine both semantic and lexical information to infer morphological forms of software terms so that the knowledge graph is more robust for knowledge search. Doctor of Philosophy (SCE) 2018-07-02T01:57:25Z 2018-07-02T01:57:25Z 2018 Thesis Chen, C. (2017). Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/75873 10.32657/10356/75873 en 183 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Software::Software engineering |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Software::Software engineering Chen, Chunyang Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search |
description |
With software penetrating into all kinds of traditional or emerging industries, there is a great demand on software development. Faced with the fact that there is a limited number of developers, one important way to meet such urgent needs is to significantly improve developers’ productivity. As the most popular Q&A site, Stack Overflow has accumulated abundant software development knowledge. Effectively leveraging such a big data can help developers reuse the experience there to further improve their working efficiency. However, the rich yet unstructured large-scale data in Stack Overflow makes it difficult to search due to two reasons. First, there are too many questions and answers within the site, and there may be lingual gap (the same meaning can be written in different languages) between the query and content in Stack Overflow. In addition, the decay of information quality such as misspelling, inconsistency, and abuse of domain-specific abbreviations aggravates the search performance. Second, some higher-order knowledge in Stack Overflow is implicit for searching and it needs certain distillation from existing raw data. In this thesis, I present methods for supporting developers’ information search over Stack Overflow. To overcome the lexical gap and information decay, I also develop an edit recommendation tool to ensure the post quality of Stack Overflow so that posts can be more easily searched by the query. But such explicit information search still requires developers to read, understand and summarize, which is time-consuming. So I propose to shift from the document (information) search to entity (knowledge) search by mining the implicit knowledge from tags in Stack Overflow to render direct answers to developers instead of asking them to read lengthy documents. I first build a basic software-specific knowledge graph including thousands of software-engineering terms and their associations by association rule mining and community detection. Then, I enrich the knowledge graph with more fine-grained relationships i.e., analogy among different third-party libraries. Finally, I combine both semantic and lexical information to infer morphological forms of software terms so that the knowledge graph is more robust for knowledge search. |
author2 |
Liu Yang |
author_facet |
Liu Yang Chen, Chunyang |
format |
Theses and Dissertations |
author |
Chen, Chunyang |
author_sort |
Chen, Chunyang |
title |
Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search |
title_short |
Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search |
title_full |
Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search |
title_fullStr |
Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search |
title_full_unstemmed |
Distilling crowd knowledge from software-specific Q&A discussions for assisting developers’ knowledge search |
title_sort |
distilling crowd knowledge from software-specific q&a discussions for assisting developers’ knowledge search |
publishDate |
2018 |
url |
http://hdl.handle.net/10356/75873 |
_version_ |
1759857990944423936 |