Exploiting text mining for Java package mappings

Developers often need to utilize method(s) that serve a functionality from more than one program library in order to obtain the latest optimized functionality or to seek a desired functionality. For example, a developer may be utilizing the array feature from the program library “org.json”. Therefor...

Full description

Saved in:
Bibliographic Details
Main Author: Ong, Kent Long Xiong
Other Authors: Liu Yang
Format: Final Year Project
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/70045
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Developers often need to utilize method(s) that serve a functionality from more than one program library in order to obtain the latest optimized functionality or to seek a desired functionality. For example, a developer may be utilizing the array feature from the program library “org.json”. Therefore, he/she may require method(s) from the package “org.json.JSONArray” to perform some array operations but “org.json” may no longer be under active development. Consequently, he/she may wish to search for method(s) in another analogical program library (i.e. gson) that performs operations on arrays such as method(s) from the package “com.google.gson.JsonArray”. As a result, a mapping between these packages are required. Such mappings are called package mappings. Due to large number of package mappings, a manual process of defining those mappings is tedious and error-prone. To relieve developers from this tiresome process, an automatic technique to create a database of likely package mappings is desired. Therefore, this report proposes the use of Term Frequency-Inverse Document Frequency (TF-IDF) to perform package mappings between analogical Java program libraries. TF-IDF makes use of package names and their descriptions from Java documentations to measure the similarity and define the package mappings between analogical program libraries. We used Application Programming Interface (API) mappings between four pairs of analogical program libraries as ground truth to evaluate our approach. Our results indicate that the mappings performed inferred the right analogical API within the top-10 recommended results over 50% of the time. With this result, we also present a web application (http://similarpackage.appspot.com/) which can recommend analogical packages for 71,775 packages of 117 pairs of analogical Java program libraries with diverse functionalities.