On domain knowledge organization and extraction in software engineering
Developers' social information seeking on the Web is unable to benefit from the recent significant advances of semantics-oriented applications, such as knowledge graph and direct answers. This is largely because existing approaches to analyzing software engineering social content, such as the...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/69477 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Developers' social information seeking on the Web is unable to benefit from the recent significant advances of semantics-oriented applications, such as knowledge graph and direct answers.
This is largely because existing approaches to analyzing software engineering social content, such as the discussions on Stack Overflow, 1) treat software-specific entities in the same way as other textual content, and 2) fall short to consider the semantic linkages between software knowledge.
In this thesis, we perform a pioneering study towards the long-term goal of enabling domain-specific knowledge graph and semantic search in software engineering.
Using the developer-generated content on Stack Overflow, we formulate a series of research problems that are the key steps for achieving this goal.
These include:
1) we investigate the online knowledge connection in software engineering by analyzing the knowledge network formed by Stack Overflow users' URL sharing activities.
Through this study, we obtain an overall understanding of the domain knowledge organization, correlation and evolution, which inspires further research on extracting and linking software engineering knowledge.
2) we propose semi-supervised methods for extracting software-specific named entities, such as API mentions, from informal natural language text.
3) we develop automated techniques to link semantically linkable knowledge at document-level, and to link a recognized API mention to its fully qualified form as appeared in the API documentation at entity-level.
We investigate the development and enhancement of NLP and IR techniques for the design challenges of these research problems brought by the socio-technical nature of software engineering social content.
Extensive experiments show the effectiveness of our proposed approaches for analyzing and solving these problems. |
---|