Inference acceleration of large language models

This dissertation delves into the challenges and bottlenecks faced by current large language models during inference from three core perspectives: data, model, and system. Through meticulous research, key factors impacting inference speed are identified, encompassing data processing efficiency, m...

全面介紹

Saved in:

書目詳細資料
主要作者:	Zhang, Boyu
其他作者:	Mao Kezhi
格式:	Thesis-Master by Coursework
語言:	English
出版:	Nanyang Technological University 2024
主題:	Computer and Information Science Large language model Quantization Approximate computation Self-attention Transformer
在線閱讀:	https://hdl.handle.net/10356/181660
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

實物特徵
總結:	This dissertation delves into the challenges and bottlenecks faced by current large language models during inference from three core perspectives: data, model, and system. Through meticulous research, key factors impacting inference speed are identified, encompassing data processing efficiency, model structure complexity, and system resource allocation and utilization. Building on this foundation, I review and interpret previous research in this field, systematically summarizing their core ideas, implementation pathways, and achievements. By deeply analyzing these studies, it not only highlight their respective strengths and weaknesses but also propose targeted improvement suggestions in line with current technological trends.

Inference acceleration of large language models

相似書籍