On domain knowledge organization and extraction in software engineering

Developers' social information seeking on the Web is unable to benefit from the recent significant advances of semantics-oriented applications, such as knowledge graph and direct answers. This is largely because existing approaches to analyzing software engineering social content, such as the...

Full description

Saved in:
Bibliographic Details
Main Author: Ye, Deheng
Other Authors: Lin Shang-Wei
Format: Theses and Dissertations
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/69477
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Developers' social information seeking on the Web is unable to benefit from the recent significant advances of semantics-oriented applications, such as knowledge graph and direct answers. This is largely because existing approaches to analyzing software engineering social content, such as the discussions on Stack Overflow, 1) treat software-specific entities in the same way as other textual content, and 2) fall short to consider the semantic linkages between software knowledge. In this thesis, we perform a pioneering study towards the long-term goal of enabling domain-specific knowledge graph and semantic search in software engineering. Using the developer-generated content on Stack Overflow, we formulate a series of research problems that are the key steps for achieving this goal. These include: 1) we investigate the online knowledge connection in software engineering by analyzing the knowledge network formed by Stack Overflow users' URL sharing activities. Through this study, we obtain an overall understanding of the domain knowledge organization, correlation and evolution, which inspires further research on extracting and linking software engineering knowledge. 2) we propose semi-supervised methods for extracting software-specific named entities, such as API mentions, from informal natural language text. 3) we develop automated techniques to link semantically linkable knowledge at document-level, and to link a recognized API mention to its fully qualified form as appeared in the API documentation at entity-level. We investigate the development and enhancement of NLP and IR techniques for the design challenges of these research problems brought by the socio-technical nature of software engineering social content. Extensive experiments show the effectiveness of our proposed approaches for analyzing and solving these problems.