On the transferability of pre-trained language models for low-resource programming languages

A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual Pre-trained Language Models (PLMs) achieves higher performance as opposed to using a corpus of code written in just one programming language. However, no analysis was m...

Full description

Saved in:
Bibliographic Details
Main Authors: CHEN, Fuxiang, FARD, Fatemeh H., LO, David, BRYKSIN, Timofey
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2022
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7693
https://ink.library.smu.edu.sg/context/sis_research/article/8696/viewcontent/On_the_transfer.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8696
record_format dspace
spelling sg-smu-ink.sis_research-86962023-01-10T03:13:18Z On the transferability of pre-trained language models for low-resource programming languages CHEN, Fuxiang FARD, Fatemeh H. LO, David BRYKSIN, Timofey A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual Pre-trained Language Models (PLMs) achieves higher performance as opposed to using a corpus of code written in just one programming language. However, no analysis was made with respect to fine-tuning monolingual PLMs. Furthermore, some programming languages are inherently different and code written in one language usually cannot be interchanged with the others, i.e., Ruby and Java code possess very different structure. To better understand how monolingual and multilingual PLMs affect different programming languages, we investigate 1) the performance of PLMs on Ruby for two popular Software Engineering tasks: Code Summarization and Code Search, 2) the strategy (to select programming languages) that works well on fine-tuning multilingual PLMs for Ruby, and 3) the performance of the fine-tuned PLMs on Ruby given different code lengths. 2022-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7693 info:doi/10.1145/3524610.3527917 https://ink.library.smu.edu.sg/context/sis_research/article/8696/viewcontent/On_the_transfer.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Pre-trained language models Low-resource languages Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Pre-trained language models
Low-resource languages
Databases and Information Systems
spellingShingle Pre-trained language models
Low-resource languages
Databases and Information Systems
CHEN, Fuxiang
FARD, Fatemeh H.
LO, David
BRYKSIN, Timofey
On the transferability of pre-trained language models for low-resource programming languages
description A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual Pre-trained Language Models (PLMs) achieves higher performance as opposed to using a corpus of code written in just one programming language. However, no analysis was made with respect to fine-tuning monolingual PLMs. Furthermore, some programming languages are inherently different and code written in one language usually cannot be interchanged with the others, i.e., Ruby and Java code possess very different structure. To better understand how monolingual and multilingual PLMs affect different programming languages, we investigate 1) the performance of PLMs on Ruby for two popular Software Engineering tasks: Code Summarization and Code Search, 2) the strategy (to select programming languages) that works well on fine-tuning multilingual PLMs for Ruby, and 3) the performance of the fine-tuned PLMs on Ruby given different code lengths.
format text
author CHEN, Fuxiang
FARD, Fatemeh H.
LO, David
BRYKSIN, Timofey
author_facet CHEN, Fuxiang
FARD, Fatemeh H.
LO, David
BRYKSIN, Timofey
author_sort CHEN, Fuxiang
title On the transferability of pre-trained language models for low-resource programming languages
title_short On the transferability of pre-trained language models for low-resource programming languages
title_full On the transferability of pre-trained language models for low-resource programming languages
title_fullStr On the transferability of pre-trained language models for low-resource programming languages
title_full_unstemmed On the transferability of pre-trained language models for low-resource programming languages
title_sort on the transferability of pre-trained language models for low-resource programming languages
publisher Institutional Knowledge at Singapore Management University
publishDate 2022
url https://ink.library.smu.edu.sg/sis_research/7693
https://ink.library.smu.edu.sg/context/sis_research/article/8696/viewcontent/On_the_transfer.pdf
_version_ 1770576415396200448