On the transferability of pre-trained language models for low-resource programming languages

A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual Pre-trained Language Models (PLMs) achieves higher performance as opposed to using a corpus of code written in just one programming language. However, no analysis was m...

Full description

Saved in:

Bibliographic Details
Main Authors:	CHEN, Fuxiang, FARD, Fatemeh H., LO, David, BRYKSIN, Timofey
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Pre-trained language models Low-resource languages Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/7693 https://ink.library.smu.edu.sg/context/sis_research/article/8696/viewcontent/On_the_transfer.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-8696
record_format	dspace
spelling	sg-smu-ink.sis_research-86962023-01-10T03:13:18Z On the transferability of pre-trained language models for low-resource programming languages CHEN, Fuxiang FARD, Fatemeh H. LO, David BRYKSIN, Timofey A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual Pre-trained Language Models (PLMs) achieves higher performance as opposed to using a corpus of code written in just one programming language. However, no analysis was made with respect to fine-tuning monolingual PLMs. Furthermore, some programming languages are inherently different and code written in one language usually cannot be interchanged with the others, i.e., Ruby and Java code possess very different structure. To better understand how monolingual and multilingual PLMs affect different programming languages, we investigate 1) the performance of PLMs on Ruby for two popular Software Engineering tasks: Code Summarization and Code Search, 2) the strategy (to select programming languages) that works well on fine-tuning multilingual PLMs for Ruby, and 3) the performance of the fine-tuned PLMs on Ruby given different code lengths. 2022-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7693 info:doi/10.1145/3524610.3527917 https://ink.library.smu.edu.sg/context/sis_research/article/8696/viewcontent/On_the_transfer.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Pre-trained language models Low-resource languages Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Pre-trained language models Low-resource languages Databases and Information Systems
spellingShingle	Pre-trained language models Low-resource languages Databases and Information Systems CHEN, Fuxiang FARD, Fatemeh H. LO, David BRYKSIN, Timofey On the transferability of pre-trained language models for low-resource programming languages
description	A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual Pre-trained Language Models (PLMs) achieves higher performance as opposed to using a corpus of code written in just one programming language. However, no analysis was made with respect to fine-tuning monolingual PLMs. Furthermore, some programming languages are inherently different and code written in one language usually cannot be interchanged with the others, i.e., Ruby and Java code possess very different structure. To better understand how monolingual and multilingual PLMs affect different programming languages, we investigate 1) the performance of PLMs on Ruby for two popular Software Engineering tasks: Code Summarization and Code Search, 2) the strategy (to select programming languages) that works well on fine-tuning multilingual PLMs for Ruby, and 3) the performance of the fine-tuned PLMs on Ruby given different code lengths.
format	text
author	CHEN, Fuxiang FARD, Fatemeh H. LO, David BRYKSIN, Timofey
author_facet	CHEN, Fuxiang FARD, Fatemeh H. LO, David BRYKSIN, Timofey
author_sort	CHEN, Fuxiang
title	On the transferability of pre-trained language models for low-resource programming languages
title_short	On the transferability of pre-trained language models for low-resource programming languages
title_full	On the transferability of pre-trained language models for low-resource programming languages
title_fullStr	On the transferability of pre-trained language models for low-resource programming languages
title_full_unstemmed	On the transferability of pre-trained language models for low-resource programming languages
title_sort	on the transferability of pre-trained language models for low-resource programming languages
publisher	Institutional Knowledge at Singapore Management University
publishDate	2022
url	https://ink.library.smu.edu.sg/sis_research/7693 https://ink.library.smu.edu.sg/context/sis_research/article/8696/viewcontent/On_the_transfer.pdf
_version_	1770576415396200448

On the transferability of pre-trained language models for low-resource programming languages

Similar Items