Inference acceleration of large language models
This dissertation delves into the challenges and bottlenecks faced by current large language models during inference from three core perspectives: data, model, and system. Through meticulous research, key factors impacting inference speed are identified, encompassing data processing efficiency, m...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181660 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This dissertation delves into the challenges and bottlenecks faced by current
large language models during inference from three core perspectives: data, model,
and system. Through meticulous research, key factors impacting inference speed
are identified, encompassing data processing efficiency, model structure complexity,
and system resource allocation and utilization. Building on this foundation,
I review and interpret previous research in this field, systematically summarizing
their core ideas, implementation pathways, and achievements. By deeply analyzing
these studies, it not only highlight their respective strengths and weaknesses
but also propose targeted improvement suggestions in line with current technological
trends. |
---|