On the effectiveness of pretrained models for API learning

Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and...

Full description

Saved in:

Bibliographic Details
Main Authors:	HADI, Mohammad Abdul, IMAM NUR BANI YUSUF, Ferdian, Thung, LUONG, Gia Kien, JIANG, Lingxiao, FARD, Fatemeh H., LO, David
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	API Deep leaning Transformers Code search API sequence API usage Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/7642 https://ink.library.smu.edu.sg/context/sis_research/article/8645/viewcontent/ICPC22Pretrained.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-8645
record_format	dspace
spelling	sg-smu-ink.sis_research-86452023-01-10T03:52:59Z On the effectiveness of pretrained models for API learning HADI, Mohammad Abdul IMAM NUR BANI YUSUF, Ferdian, Thung LUONG, Gia Kien JIANG, Lingxiao FARD, Fatemeh H. LO, David Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner. Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences. As it stands, the first approach treats queries and API names as bags of words. It lacks deep comprehension of the semantics of the queries. The latter approach adapts a neural language model to encode a user query into a fixed-length context vector and generate API sequences from the context vector. We want to understand the effectiveness of recent Pre-trained Transformer based Models (PTMs) for the API learning task. These PTMs are trained on large natural language corpora in an unsupervised manner to retain contextual knowledge about the language and have found success in solving similar Natural Language Processing (NLP) problems. However, the applicability of PTMs has not yet been explored for the API sequence generation task. We use a dataset that contains 7 million annotations collected from GitHub to evaluate the PTMs empirically. This dataset was also used to assess previous approaches. Based on our results, PTMs generate more accurate API sequences and outperform other related methods by ∼11%. We have also identified two different tokenization approaches that can contribute to a significant boost in PTMs' performance for the API sequence generation task. 2022-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7642 info:doi/10.1145/3524610.3527886 https://ink.library.smu.edu.sg/context/sis_research/article/8645/viewcontent/ICPC22Pretrained.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University API Deep leaning Transformers Code search API sequence API usage Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	API Deep leaning Transformers Code search API sequence API usage Software Engineering
spellingShingle	API Deep leaning Transformers Code search API sequence API usage Software Engineering HADI, Mohammad Abdul IMAM NUR BANI YUSUF, Ferdian, Thung LUONG, Gia Kien JIANG, Lingxiao FARD, Fatemeh H. LO, David On the effectiveness of pretrained models for API learning
description	Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner. Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences. As it stands, the first approach treats queries and API names as bags of words. It lacks deep comprehension of the semantics of the queries. The latter approach adapts a neural language model to encode a user query into a fixed-length context vector and generate API sequences from the context vector. We want to understand the effectiveness of recent Pre-trained Transformer based Models (PTMs) for the API learning task. These PTMs are trained on large natural language corpora in an unsupervised manner to retain contextual knowledge about the language and have found success in solving similar Natural Language Processing (NLP) problems. However, the applicability of PTMs has not yet been explored for the API sequence generation task. We use a dataset that contains 7 million annotations collected from GitHub to evaluate the PTMs empirically. This dataset was also used to assess previous approaches. Based on our results, PTMs generate more accurate API sequences and outperform other related methods by ∼11%. We have also identified two different tokenization approaches that can contribute to a significant boost in PTMs' performance for the API sequence generation task.
format	text
author	HADI, Mohammad Abdul IMAM NUR BANI YUSUF, Ferdian, Thung LUONG, Gia Kien JIANG, Lingxiao FARD, Fatemeh H. LO, David
author_facet	HADI, Mohammad Abdul IMAM NUR BANI YUSUF, Ferdian, Thung LUONG, Gia Kien JIANG, Lingxiao FARD, Fatemeh H. LO, David
author_sort	HADI, Mohammad Abdul
title	On the effectiveness of pretrained models for API learning
title_short	On the effectiveness of pretrained models for API learning
title_full	On the effectiveness of pretrained models for API learning
title_fullStr	On the effectiveness of pretrained models for API learning
title_full_unstemmed	On the effectiveness of pretrained models for API learning
title_sort	on the effectiveness of pretrained models for api learning
publisher	Institutional Knowledge at Singapore Management University
publishDate	2022
url	https://ink.library.smu.edu.sg/sis_research/7642 https://ink.library.smu.edu.sg/context/sis_research/article/8645/viewcontent/ICPC22Pretrained.pdf
_version_	1770576408012128256

On the effectiveness of pretrained models for API learning

Similar Items