Mining likely analogical APIs across third-party libraries via large-scale unsupervised API semantics embedding

Establishing API mappings between third-party libraries is a prerequisite step for library migration tasks. Manually establishing API mappings is tedious due to the large number of APIs to be examined. Having an automatic technique to create a database of likely API mappings can significantly ease t...

Full description

Saved in:
Bibliographic Details
Main Authors: Chen, Chunyang, Xing, Zhenchang, Liu, Yang, Xiong, Kent Long Xiong
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2021
Subjects:
Online Access:https://hdl.handle.net/10356/152782
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-152782
record_format dspace
spelling sg-ntu-dr.10356-1527822021-09-29T05:42:55Z Mining likely analogical APIs across third-party libraries via large-scale unsupervised API semantics embedding Chen, Chunyang Xing, Zhenchang Liu, Yang Xiong, Kent Long Xiong School of Computer Science and Engineering Engineering::Computer science and engineering Libraries Semantics Establishing API mappings between third-party libraries is a prerequisite step for library migration tasks. Manually establishing API mappings is tedious due to the large number of APIs to be examined. Having an automatic technique to create a database of likely API mappings can significantly ease the task. Unfortunately, existing techniques either adopt supervised learning mechanism that requires already-ported or functionality similar applications across major programming languages or platforms, which are difficult to come by for an arbitrary pair of third-party libraries, or cannot deal with lexical gap in the API descriptions of different libraries. To overcome these limitations, we present an unsupervised deep learning based approach to embed both API usage semantics and API description (name and document) semantics into vector space for inferring likely analogical API mappings between libraries. Based on deep learning models trained using tens of millions of API call sequences, method names and comments of 2.8 millions of methods from 135,127 GitHub projects, our approach significantly outperforms other deep learning or traditional information retrieval (IR) methods for inferring likely analogical APIs. We implement a proof-of-concept website (https://similarapi.appspot.com) which can recommend analogical APIs for 583,501 APIs of 111 pairs of analogical Java libraries with diverse functionalities. This scale of third-party analogical-API database has never been achieved before. 2021-09-29T05:42:55Z 2021-09-29T05:42:55Z 2021 Journal Article Chen, C., Xing, Z., Liu, Y. & Xiong, K. L. X. (2021). Mining likely analogical APIs across third-party libraries via large-scale unsupervised API semantics embedding. IEEE Transactions On Software Engineering, 47(3), 432-447. https://dx.doi.org/10.1109/TSE.2019.2896123 0098-5589 https://hdl.handle.net/10356/152782 10.1109/TSE.2019.2896123 2-s2.0-85061316490 3 47 432 447 en IEEE Transactions on Software Engineering © 2019 IEEE. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Libraries
Semantics
spellingShingle Engineering::Computer science and engineering
Libraries
Semantics
Chen, Chunyang
Xing, Zhenchang
Liu, Yang
Xiong, Kent Long Xiong
Mining likely analogical APIs across third-party libraries via large-scale unsupervised API semantics embedding
description Establishing API mappings between third-party libraries is a prerequisite step for library migration tasks. Manually establishing API mappings is tedious due to the large number of APIs to be examined. Having an automatic technique to create a database of likely API mappings can significantly ease the task. Unfortunately, existing techniques either adopt supervised learning mechanism that requires already-ported or functionality similar applications across major programming languages or platforms, which are difficult to come by for an arbitrary pair of third-party libraries, or cannot deal with lexical gap in the API descriptions of different libraries. To overcome these limitations, we present an unsupervised deep learning based approach to embed both API usage semantics and API description (name and document) semantics into vector space for inferring likely analogical API mappings between libraries. Based on deep learning models trained using tens of millions of API call sequences, method names and comments of 2.8 millions of methods from 135,127 GitHub projects, our approach significantly outperforms other deep learning or traditional information retrieval (IR) methods for inferring likely analogical APIs. We implement a proof-of-concept website (https://similarapi.appspot.com) which can recommend analogical APIs for 583,501 APIs of 111 pairs of analogical Java libraries with diverse functionalities. This scale of third-party analogical-API database has never been achieved before.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Chen, Chunyang
Xing, Zhenchang
Liu, Yang
Xiong, Kent Long Xiong
format Article
author Chen, Chunyang
Xing, Zhenchang
Liu, Yang
Xiong, Kent Long Xiong
author_sort Chen, Chunyang
title Mining likely analogical APIs across third-party libraries via large-scale unsupervised API semantics embedding
title_short Mining likely analogical APIs across third-party libraries via large-scale unsupervised API semantics embedding
title_full Mining likely analogical APIs across third-party libraries via large-scale unsupervised API semantics embedding
title_fullStr Mining likely analogical APIs across third-party libraries via large-scale unsupervised API semantics embedding
title_full_unstemmed Mining likely analogical APIs across third-party libraries via large-scale unsupervised API semantics embedding
title_sort mining likely analogical apis across third-party libraries via large-scale unsupervised api semantics embedding
publishDate 2021
url https://hdl.handle.net/10356/152782
_version_ 1712300638197514240