FT2Ra: A fine-tuning-inspired approach to retrieval-augmented code completion

The rise of code pre-trained models has significantly enhanced various coding tasks, such as code completion, and tools like GitHub Copilot. However, the substantial size of these models, especially large models, poses a significant challenge when it comes to fine-tuning them for specific downstream...

Full description

Saved in:

Bibliographic Details
Main Authors:	GUO, Qi, LIU, Shangqing, XIE, Xiaofei, TANG, Ze Tang
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Code completions Critical questions Down-stream Fine tuning Language model Large models Model prediction Ode completion Retrieval-augmented language model Databases and Information Systems Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/sis_research/9444 https://ink.library.smu.edu.sg/context/sis_research/article/10444/viewcontent/FT2Ra__A_Fine_Tuning_Inspired_Approach_to_Retrieval_Augmented_Code_Completion.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10444
record_format	dspace
spelling	sg-smu-ink.sis_research-104442024-11-11T08:06:10Z FT2Ra: A fine-tuning-inspired approach to retrieval-augmented code completion GUO, Qi LIU, Shangqing XIE, Xiaofei TANG, Ze Tang The rise of code pre-trained models has significantly enhanced various coding tasks, such as code completion, and tools like GitHub Copilot. However, the substantial size of these models, especially large models, poses a significant challenge when it comes to fine-tuning them for specific downstream tasks. As an alternative approach, retrieval-based methods have emerged as a promising solution, augmenting model predictions without the need for fine-tuning. Despite their potential, a significant challenge is that the designs of these methods often rely on heuristics, leaving critical questions about what information should be stored or retrieved and how to interpolate such information for augmenting predictions. To tackle this challenge, we first perform a theoretical analysis of the fine-tuning process, highlighting the importance of delta logits as a catalyst for improving model predictions. Building on this insight, we develop a novel retrieval-based method, FT2Ra, which aims to mimic genuine fine-tuning. While FT2Ra adopts a retrieval-based mechanism, it uniquely adopts a paradigm with a learning rate and multi-epoch retrievals, which is similar to fine-tuning. We conducted a comprehensive evaluation of FT2Ra in both token-level and line-level code completions. Our findings demonstrate the remarkable effectiveness of FT2Ra when compared to state-of-the-art methods and its potential to genuine fine-tuning. In token-level completion, which represents a relatively easier task, FT2Ra achieves a 4.29% improvement in accuracy compared to the best baseline method on UniXcoder. In the more challenging line-level completion task, we observe a substantial more than twice increase in Exact Match (EM) performance, indicating the significant advantages of our theoretical analysis. Notably, even when operating without actual fine-tuning, FT2Ra exhibits competitive performance compared to the models with real fine-tuning 2024-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9444 info:doi/10.1145/3650212.3652130 https://ink.library.smu.edu.sg/context/sis_research/article/10444/viewcontent/FT2Ra__A_Fine_Tuning_Inspired_Approach_to_Retrieval_Augmented_Code_Completion.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Code completions Critical questions Down-stream Fine tuning Language model Large models Model prediction Ode completion Retrieval-augmented language model Databases and Information Systems Theory and Algorithms
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Code completions Critical questions Down-stream Fine tuning Language model Large models Model prediction Ode completion Retrieval-augmented language model Databases and Information Systems Theory and Algorithms
spellingShingle	Code completions Critical questions Down-stream Fine tuning Language model Large models Model prediction Ode completion Retrieval-augmented language model Databases and Information Systems Theory and Algorithms GUO, Qi LIU, Shangqing XIE, Xiaofei TANG, Ze Tang FT2Ra: A fine-tuning-inspired approach to retrieval-augmented code completion
description	The rise of code pre-trained models has significantly enhanced various coding tasks, such as code completion, and tools like GitHub Copilot. However, the substantial size of these models, especially large models, poses a significant challenge when it comes to fine-tuning them for specific downstream tasks. As an alternative approach, retrieval-based methods have emerged as a promising solution, augmenting model predictions without the need for fine-tuning. Despite their potential, a significant challenge is that the designs of these methods often rely on heuristics, leaving critical questions about what information should be stored or retrieved and how to interpolate such information for augmenting predictions. To tackle this challenge, we first perform a theoretical analysis of the fine-tuning process, highlighting the importance of delta logits as a catalyst for improving model predictions. Building on this insight, we develop a novel retrieval-based method, FT2Ra, which aims to mimic genuine fine-tuning. While FT2Ra adopts a retrieval-based mechanism, it uniquely adopts a paradigm with a learning rate and multi-epoch retrievals, which is similar to fine-tuning. We conducted a comprehensive evaluation of FT2Ra in both token-level and line-level code completions. Our findings demonstrate the remarkable effectiveness of FT2Ra when compared to state-of-the-art methods and its potential to genuine fine-tuning. In token-level completion, which represents a relatively easier task, FT2Ra achieves a 4.29% improvement in accuracy compared to the best baseline method on UniXcoder. In the more challenging line-level completion task, we observe a substantial more than twice increase in Exact Match (EM) performance, indicating the significant advantages of our theoretical analysis. Notably, even when operating without actual fine-tuning, FT2Ra exhibits competitive performance compared to the models with real fine-tuning
format	text
author	GUO, Qi LIU, Shangqing XIE, Xiaofei TANG, Ze Tang
author_facet	GUO, Qi LIU, Shangqing XIE, Xiaofei TANG, Ze Tang
author_sort	GUO, Qi
title	FT2Ra: A fine-tuning-inspired approach to retrieval-augmented code completion
title_short	FT2Ra: A fine-tuning-inspired approach to retrieval-augmented code completion
title_full	FT2Ra: A fine-tuning-inspired approach to retrieval-augmented code completion
title_fullStr	FT2Ra: A fine-tuning-inspired approach to retrieval-augmented code completion
title_full_unstemmed	FT2Ra: A fine-tuning-inspired approach to retrieval-augmented code completion
title_sort	ft2ra: a fine-tuning-inspired approach to retrieval-augmented code completion
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9444 https://ink.library.smu.edu.sg/context/sis_research/article/10444/viewcontent/FT2Ra__A_Fine_Tuning_Inspired_Approach_to_Retrieval_Augmented_Code_Completion.pdf
_version_	1816859075461251072

FT2Ra: A fine-tuning-inspired approach to retrieval-augmented code completion

Similar Items