Developing locally trainable large language models

Emerging Large Language Models (LLMs) like GPT-3.5 and GPT-4 have been fundamentally transforming human society since their launch, as they demonstrate groundbreaking capabilities across various tasks. However, the colossal model size of such LLMs results a prohibitive training cost and computing r...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Hailin
Other Authors: Joty Shafiq Rayhan
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182242
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-182242
record_format dspace
spelling sg-ntu-dr.10356-1822422025-01-20T00:52:13Z Developing locally trainable large language models Chen, Hailin Joty Shafiq Rayhan Sun Aixin College of Computing and Data Science AXSun@ntu.edu.sg, srjoty@ntu.edu.sg Computer and Information Science Emerging Large Language Models (LLMs) like GPT-3.5 and GPT-4 have been fundamentally transforming human society since their launch, as they demonstrate groundbreaking capabilities across various tasks. However, the colossal model size of such LLMs results a prohibitive training cost and computing resources required. Therefore, it is nearly impossible for individual researchers and smaller entities to train and own their LLMs on par with close-sourced ones. This thesis addresses the urgent need for better training methods that enable smaller LLMs to approach or even surpass the capabilities of their larger counterparts without the excessive cost. Firstly, we propose a survey on open-source LLMs that consolidates the state-of-the-art open-source LLMs in various capabilities and discusses the good practices and pitfalls of training LLMs — as locally pretraining an LLM from scratch is too resource intensive and finetuning an open-source LLM provides a promising alternative. Secondly, we explore Parameter Efficient Finetuning (PEFT) methods that introduce minimal additional parameters to adapt LLMs with a modular design, demonstrating high performance in challenging continual learning setting that approaches full model finetuning performance. Thirdly, we extend PEFT framework to enhance in-context learning ability of small LMs by translating the demonstrations examples to soft prompts. Lastly, we improve black-box distillation where we personalize and modularize the knowledge transfer from large to small models, outperforming conventional distillation with only one-third of the data. We believe the works presented in the thesis better equip researchers to develop locally trainable LLMs that can compete with close-source counterparts – democratizing the benefits of cutting-edge AI technologies to more. Doctor of Philosophy 2025-01-20T00:52:13Z 2025-01-20T00:52:13Z 2025 Thesis-Doctor of Philosophy Chen, H. (2025). Developing locally trainable large language models. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182242 https://hdl.handle.net/10356/182242 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
spellingShingle Computer and Information Science
Chen, Hailin
Developing locally trainable large language models
description Emerging Large Language Models (LLMs) like GPT-3.5 and GPT-4 have been fundamentally transforming human society since their launch, as they demonstrate groundbreaking capabilities across various tasks. However, the colossal model size of such LLMs results a prohibitive training cost and computing resources required. Therefore, it is nearly impossible for individual researchers and smaller entities to train and own their LLMs on par with close-sourced ones. This thesis addresses the urgent need for better training methods that enable smaller LLMs to approach or even surpass the capabilities of their larger counterparts without the excessive cost. Firstly, we propose a survey on open-source LLMs that consolidates the state-of-the-art open-source LLMs in various capabilities and discusses the good practices and pitfalls of training LLMs — as locally pretraining an LLM from scratch is too resource intensive and finetuning an open-source LLM provides a promising alternative. Secondly, we explore Parameter Efficient Finetuning (PEFT) methods that introduce minimal additional parameters to adapt LLMs with a modular design, demonstrating high performance in challenging continual learning setting that approaches full model finetuning performance. Thirdly, we extend PEFT framework to enhance in-context learning ability of small LMs by translating the demonstrations examples to soft prompts. Lastly, we improve black-box distillation where we personalize and modularize the knowledge transfer from large to small models, outperforming conventional distillation with only one-third of the data. We believe the works presented in the thesis better equip researchers to develop locally trainable LLMs that can compete with close-source counterparts – democratizing the benefits of cutting-edge AI technologies to more.
author2 Joty Shafiq Rayhan
author_facet Joty Shafiq Rayhan
Chen, Hailin
format Thesis-Doctor of Philosophy
author Chen, Hailin
author_sort Chen, Hailin
title Developing locally trainable large language models
title_short Developing locally trainable large language models
title_full Developing locally trainable large language models
title_fullStr Developing locally trainable large language models
title_full_unstemmed Developing locally trainable large language models
title_sort developing locally trainable large language models
publisher Nanyang Technological University
publishDate 2025
url https://hdl.handle.net/10356/182242
_version_ 1821833195435327488