Mining stack overflow to recommend Java API classes using word embedding and topic modelling / Lee Wai Keat

To reduce development effort, today’s software development technologies rely heavily on reusable components provided by Application Programming Interfaces (APIs). However, studies have found that APIs are of poor usability and programmers find it difficult to use them. A number of factors affect the...

Full description

Saved in:
Bibliographic Details
Main Author: Lee , Wai Keat
Format: Thesis
Published: 2019
Subjects:
Online Access:http://studentsrepo.um.edu.my/13077/1/Lee_Wai_Keat.pdf
http://studentsrepo.um.edu.my/13077/3/Lee_Wai_Keat.pdf
http://studentsrepo.um.edu.my/13077/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaya
Description
Summary:To reduce development effort, today’s software development technologies rely heavily on reusable components provided by Application Programming Interfaces (APIs). However, studies have found that APIs are of poor usability and programmers find it difficult to use them. A number of factors affect the usability and learning of an API. The most critical one is the API documentation. Therefore, it is unsurprising that developers look for alternative information sources to learn APIs. One such sources is the crowd documentation of APIs that are available in Community Question and Answer (CQA) websites, such as Stack Overflow (SO). Studies have shown that the large volume of data in SO make it suitable for data mining and analytics for APIs. Following that, this research aims to: 1) identify Java programmers’ common Java programming problems based on their level of expertise, by analyzing Java-related duplicate discussion posts in SO (Study 1); 2) to address the lexical gap between natural language queries and Java APIs documentation, and the lexical gap between natural language queries and the Java programming codes, by designing and implementing an approach for recommending Java API classes for programmers’ natural language queries using data mined from SO (Study 2). Existing studies have found that SO questions/discussion posts have a wide coverage on Java API. Java was chosen in this research as it is a long established and popular programming language. Study 1 found that the novice group is the top contributor and the expert group contributes significantly lower to duplicate questions asked in SO, and the most common problem Java programmers face is understanding and/or fixing errors but expert programmers’ question more about the reasons behind some Java programming concepts. The proposed approach in Study 2 employs Natural Language Processing techniques, namely, word embedding and topic modelling, and heuristic rules to produce the Java API classes recommendations. The benchmarking of the performance of the proposed approach against existing state-of-the-art approach using four metrics (Top-K accuracy, Mean Recall @ K, Mean Reciprocal Rank @ K and Mean Average Precision @ K) shows that the proposed approach performs better. The proposed approach was implemented in a Java API classes recommender running on a server and an Eclipse IDE’s plug-in (APIRecJ) was implemented as the front-end to access the recommender’s functionalities. The results of the user evaluation study show that APIRecJ is generally useful in searching for Java API classes relevant to the programmers’ queries. In summary, the contribution of this research are: a set of common Java programming problems and Java API classes that Java programmers struggle with, that Java educators and learning resources can devote more attention to; an approach for recommending relevant Java API classes for programmers’ queries that outperforms existing approaches; a Java API classes recommender; and an Eclipse IDE’s plug-in that provides assistance on Java API classes relevant to the programmers’ queries within the IDE.