Inference acceleration of large language models

This dissertation delves into the challenges and bottlenecks faced by current large language models during inference from three core perspectives: data, model, and system. Through meticulous research, key factors impacting inference speed are identified, encompassing data processing efficiency, m...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Boyu
Other Authors: Mao Kezhi
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181660
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This dissertation delves into the challenges and bottlenecks faced by current large language models during inference from three core perspectives: data, model, and system. Through meticulous research, key factors impacting inference speed are identified, encompassing data processing efficiency, model structure complexity, and system resource allocation and utilization. Building on this foundation, I review and interpret previous research in this field, systematically summarizing their core ideas, implementation pathways, and achievements. By deeply analyzing these studies, it not only highlight their respective strengths and weaknesses but also propose targeted improvement suggestions in line with current technological trends.