Developing locally trainable large language models

Emerging Large Language Models (LLMs) like GPT-3.5 and GPT-4 have been fundamentally transforming human society since their launch, as they demonstrate groundbreaking capabilities across various tasks. However, the colossal model size of such LLMs results a prohibitive training cost and computing r...

Full description

Saved in:

Bibliographic Details
Main Author:	Chen, Hailin
Other Authors:	Joty Shafiq Rayhan
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2025
Subjects:	Computer and Information Science
Online Access:	https://hdl.handle.net/10356/182242
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-182242
record_format	dspace
spelling	sg-ntu-dr.10356-1822422025-01-20T00:52:13Z Developing locally trainable large language models Chen, Hailin Joty Shafiq Rayhan Sun Aixin College of Computing and Data Science AXSun@ntu.edu.sg, srjoty@ntu.edu.sg Computer and Information Science Emerging Large Language Models (LLMs) like GPT-3.5 and GPT-4 have been fundamentally transforming human society since their launch, as they demonstrate groundbreaking capabilities across various tasks. However, the colossal model size of such LLMs results a prohibitive training cost and computing resources required. Therefore, it is nearly impossible for individual researchers and smaller entities to train and own their LLMs on par with close-sourced ones. This thesis addresses the urgent need for better training methods that enable smaller LLMs to approach or even surpass the capabilities of their larger counterparts without the excessive cost. Firstly, we propose a survey on open-source LLMs that consolidates the state-of-the-art open-source LLMs in various capabilities and discusses the good practices and pitfalls of training LLMs — as locally pretraining an LLM from scratch is too resource intensive and finetuning an open-source LLM provides a promising alternative. Secondly, we explore Parameter Efficient Finetuning (PEFT) methods that introduce minimal additional parameters to adapt LLMs with a modular design, demonstrating high performance in challenging continual learning setting that approaches full model finetuning performance. Thirdly, we extend PEFT framework to enhance in-context learning ability of small LMs by translating the demonstrations examples to soft prompts. Lastly, we improve black-box distillation where we personalize and modularize the knowledge transfer from large to small models, outperforming conventional distillation with only one-third of the data. We believe the works presented in the thesis better equip researchers to develop locally trainable LLMs that can compete with close-source counterparts – democratizing the benefits of cutting-edge AI technologies to more. Doctor of Philosophy 2025-01-20T00:52:13Z 2025-01-20T00:52:13Z 2025 Thesis-Doctor of Philosophy Chen, H. (2025). Developing locally trainable large language models. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/182242 https://hdl.handle.net/10356/182242 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science
spellingShingle	Computer and Information Science Chen, Hailin Developing locally trainable large language models
description	Emerging Large Language Models (LLMs) like GPT-3.5 and GPT-4 have been fundamentally transforming human society since their launch, as they demonstrate groundbreaking capabilities across various tasks. However, the colossal model size of such LLMs results a prohibitive training cost and computing resources required. Therefore, it is nearly impossible for individual researchers and smaller entities to train and own their LLMs on par with close-sourced ones. This thesis addresses the urgent need for better training methods that enable smaller LLMs to approach or even surpass the capabilities of their larger counterparts without the excessive cost. Firstly, we propose a survey on open-source LLMs that consolidates the state-of-the-art open-source LLMs in various capabilities and discusses the good practices and pitfalls of training LLMs — as locally pretraining an LLM from scratch is too resource intensive and finetuning an open-source LLM provides a promising alternative. Secondly, we explore Parameter Efficient Finetuning (PEFT) methods that introduce minimal additional parameters to adapt LLMs with a modular design, demonstrating high performance in challenging continual learning setting that approaches full model finetuning performance. Thirdly, we extend PEFT framework to enhance in-context learning ability of small LMs by translating the demonstrations examples to soft prompts. Lastly, we improve black-box distillation where we personalize and modularize the knowledge transfer from large to small models, outperforming conventional distillation with only one-third of the data. We believe the works presented in the thesis better equip researchers to develop locally trainable LLMs that can compete with close-source counterparts – democratizing the benefits of cutting-edge AI technologies to more.
author2	Joty Shafiq Rayhan
author_facet	Joty Shafiq Rayhan Chen, Hailin
format	Thesis-Doctor of Philosophy
author	Chen, Hailin
author_sort	Chen, Hailin
title	Developing locally trainable large language models
title_short	Developing locally trainable large language models
title_full	Developing locally trainable large language models
title_fullStr	Developing locally trainable large language models
title_full_unstemmed	Developing locally trainable large language models
title_sort	developing locally trainable large language models
publisher	Nanyang Technological University
publishDate	2025
url	https://hdl.handle.net/10356/182242
_version_	1821833195435327488

Developing locally trainable large language models

Similar Items