On the effectiveness of pretrained models for API learning

Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and...

Full description

Saved in:
Bibliographic Details
Main Authors: HADI, Mohammad Abdul, IMAM NUR BANI YUSUF, Ferdian, Thung, LUONG, Gia Kien, JIANG, Lingxiao, FARD, Fatemeh H., LO, David
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
API
Online Access:https://ink.library.smu.edu.sg/sis_research/7642
https://ink.library.smu.edu.sg/context/sis_research/article/8645/viewcontent/ICPC22Pretrained.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8645
record_format dspace
spelling sg-smu-ink.sis_research-86452023-01-10T03:52:59Z On the effectiveness of pretrained models for API learning HADI, Mohammad Abdul IMAM NUR BANI YUSUF, Ferdian, Thung LUONG, Gia Kien JIANG, Lingxiao FARD, Fatemeh H. LO, David Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner. Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences. As it stands, the first approach treats queries and API names as bags of words. It lacks deep comprehension of the semantics of the queries. The latter approach adapts a neural language model to encode a user query into a fixed-length context vector and generate API sequences from the context vector. We want to understand the effectiveness of recent Pre-trained Transformer based Models (PTMs) for the API learning task. These PTMs are trained on large natural language corpora in an unsupervised manner to retain contextual knowledge about the language and have found success in solving similar Natural Language Processing (NLP) problems. However, the applicability of PTMs has not yet been explored for the API sequence generation task. We use a dataset that contains 7 million annotations collected from GitHub to evaluate the PTMs empirically. This dataset was also used to assess previous approaches. Based on our results, PTMs generate more accurate API sequences and outperform other related methods by ∼11%. We have also identified two different tokenization approaches that can contribute to a significant boost in PTMs' performance for the API sequence generation task. 2022-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7642 info:doi/10.1145/3524610.3527886 https://ink.library.smu.edu.sg/context/sis_research/article/8645/viewcontent/ICPC22Pretrained.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University API Deep leaning Transformers Code search API sequence API usage Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic API
Deep leaning
Transformers
Code search
API sequence
API usage
Software Engineering
spellingShingle API
Deep leaning
Transformers
Code search
API sequence
API usage
Software Engineering
HADI, Mohammad Abdul
IMAM NUR BANI YUSUF,
Ferdian, Thung
LUONG, Gia Kien
JIANG, Lingxiao
FARD, Fatemeh H.
LO, David
On the effectiveness of pretrained models for API learning
description Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner. Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences. As it stands, the first approach treats queries and API names as bags of words. It lacks deep comprehension of the semantics of the queries. The latter approach adapts a neural language model to encode a user query into a fixed-length context vector and generate API sequences from the context vector. We want to understand the effectiveness of recent Pre-trained Transformer based Models (PTMs) for the API learning task. These PTMs are trained on large natural language corpora in an unsupervised manner to retain contextual knowledge about the language and have found success in solving similar Natural Language Processing (NLP) problems. However, the applicability of PTMs has not yet been explored for the API sequence generation task. We use a dataset that contains 7 million annotations collected from GitHub to evaluate the PTMs empirically. This dataset was also used to assess previous approaches. Based on our results, PTMs generate more accurate API sequences and outperform other related methods by ∼11%. We have also identified two different tokenization approaches that can contribute to a significant boost in PTMs' performance for the API sequence generation task.
format text
author HADI, Mohammad Abdul
IMAM NUR BANI YUSUF,
Ferdian, Thung
LUONG, Gia Kien
JIANG, Lingxiao
FARD, Fatemeh H.
LO, David
author_facet HADI, Mohammad Abdul
IMAM NUR BANI YUSUF,
Ferdian, Thung
LUONG, Gia Kien
JIANG, Lingxiao
FARD, Fatemeh H.
LO, David
author_sort HADI, Mohammad Abdul
title On the effectiveness of pretrained models for API learning
title_short On the effectiveness of pretrained models for API learning
title_full On the effectiveness of pretrained models for API learning
title_fullStr On the effectiveness of pretrained models for API learning
title_full_unstemmed On the effectiveness of pretrained models for API learning
title_sort on the effectiveness of pretrained models for api learning
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/7642
https://ink.library.smu.edu.sg/context/sis_research/article/8645/viewcontent/ICPC22Pretrained.pdf
_version_ 1770576408012128256