Developing locally trainable large language models
Emerging Large Language Models (LLMs) like GPT-3.5 and GPT-4 have been fundamentally transforming human society since their launch, as they demonstrate groundbreaking capabilities across various tasks. However, the colossal model size of such LLMs results a prohibitive training cost and computing r...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2025
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/182242 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Emerging Large Language Models (LLMs) like GPT-3.5 and GPT-4 have been fundamentally transforming human society since their launch, as they demonstrate groundbreaking
capabilities across various tasks. However, the colossal model size of such LLMs results a prohibitive training cost and computing resources required. Therefore, it is nearly
impossible for individual researchers and smaller entities to train and own their LLMs on
par with close-sourced ones.
This thesis addresses the urgent need for better training methods that enable smaller
LLMs to approach or even surpass the capabilities of their larger counterparts without the
excessive cost. Firstly, we propose a survey on open-source LLMs that consolidates the
state-of-the-art open-source LLMs in various capabilities and discusses the good practices
and pitfalls of training LLMs — as locally pretraining an LLM from scratch is too
resource intensive and finetuning an open-source LLM provides a promising alternative.
Secondly, we explore Parameter Efficient Finetuning (PEFT) methods that introduce
minimal additional parameters to adapt LLMs with a modular design, demonstrating
high performance in challenging continual learning setting that approaches full model
finetuning performance. Thirdly, we extend PEFT framework to enhance in-context
learning ability of small LMs by translating the demonstrations examples to soft prompts.
Lastly, we improve black-box distillation where we personalize and modularize the
knowledge transfer from large to small models, outperforming conventional distillation
with only one-third of the data.
We believe the works presented in the thesis better equip researchers to develop locally
trainable LLMs that can compete with close-source counterparts – democratizing the
benefits of cutting-edge AI technologies to more. |
---|