Developing locally trainable large language models

Emerging Large Language Models (LLMs) like GPT-3.5 and GPT-4 have been fundamentally transforming human society since their launch, as they demonstrate groundbreaking capabilities across various tasks. However, the colossal model size of such LLMs results a prohibitive training cost and computing r...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Hailin
Other Authors: Joty Shafiq Rayhan
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182242
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Emerging Large Language Models (LLMs) like GPT-3.5 and GPT-4 have been fundamentally transforming human society since their launch, as they demonstrate groundbreaking capabilities across various tasks. However, the colossal model size of such LLMs results a prohibitive training cost and computing resources required. Therefore, it is nearly impossible for individual researchers and smaller entities to train and own their LLMs on par with close-sourced ones. This thesis addresses the urgent need for better training methods that enable smaller LLMs to approach or even surpass the capabilities of their larger counterparts without the excessive cost. Firstly, we propose a survey on open-source LLMs that consolidates the state-of-the-art open-source LLMs in various capabilities and discusses the good practices and pitfalls of training LLMs — as locally pretraining an LLM from scratch is too resource intensive and finetuning an open-source LLM provides a promising alternative. Secondly, we explore Parameter Efficient Finetuning (PEFT) methods that introduce minimal additional parameters to adapt LLMs with a modular design, demonstrating high performance in challenging continual learning setting that approaches full model finetuning performance. Thirdly, we extend PEFT framework to enhance in-context learning ability of small LMs by translating the demonstrations examples to soft prompts. Lastly, we improve black-box distillation where we personalize and modularize the knowledge transfer from large to small models, outperforming conventional distillation with only one-third of the data. We believe the works presented in the thesis better equip researchers to develop locally trainable LLMs that can compete with close-source counterparts – democratizing the benefits of cutting-edge AI technologies to more.