Developing locally trainable large language models

Emerging Large Language Models (LLMs) like GPT-3.5 and GPT-4 have been fundamentally transforming human society since their launch, as they demonstrate groundbreaking capabilities across various tasks. However, the colossal model size of such LLMs results a prohibitive training cost and computing r...

全面介紹

Saved in:
書目詳細資料
主要作者: Chen, Hailin
其他作者: Joty Shafiq Rayhan
格式: Thesis-Doctor of Philosophy
語言:English
出版: Nanyang Technological University 2025
主題:
在線閱讀:https://hdl.handle.net/10356/182242
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Nanyang Technological University
語言: English
實物特徵
總結:Emerging Large Language Models (LLMs) like GPT-3.5 and GPT-4 have been fundamentally transforming human society since their launch, as they demonstrate groundbreaking capabilities across various tasks. However, the colossal model size of such LLMs results a prohibitive training cost and computing resources required. Therefore, it is nearly impossible for individual researchers and smaller entities to train and own their LLMs on par with close-sourced ones. This thesis addresses the urgent need for better training methods that enable smaller LLMs to approach or even surpass the capabilities of their larger counterparts without the excessive cost. Firstly, we propose a survey on open-source LLMs that consolidates the state-of-the-art open-source LLMs in various capabilities and discusses the good practices and pitfalls of training LLMs — as locally pretraining an LLM from scratch is too resource intensive and finetuning an open-source LLM provides a promising alternative. Secondly, we explore Parameter Efficient Finetuning (PEFT) methods that introduce minimal additional parameters to adapt LLMs with a modular design, demonstrating high performance in challenging continual learning setting that approaches full model finetuning performance. Thirdly, we extend PEFT framework to enhance in-context learning ability of small LMs by translating the demonstrations examples to soft prompts. Lastly, we improve black-box distillation where we personalize and modularize the knowledge transfer from large to small models, outperforming conventional distillation with only one-third of the data. We believe the works presented in the thesis better equip researchers to develop locally trainable LLMs that can compete with close-source counterparts – democratizing the benefits of cutting-edge AI technologies to more.