Inference acceleration of large language models
This dissertation delves into the challenges and bottlenecks faced by current large language models during inference from three core perspectives: data, model, and system. Through meticulous research, key factors impacting inference speed are identified, encompassing data processing efficiency, m...
محفوظ في:
المؤلف الرئيسي: | |
---|---|
مؤلفون آخرون: | |
التنسيق: | Thesis-Master by Coursework |
اللغة: | English |
منشور في: |
Nanyang Technological University
2024
|
الموضوعات: | |
الوصول للمادة أونلاين: | https://hdl.handle.net/10356/181660 |
الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
id |
sg-ntu-dr.10356-181660 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1816602024-12-13T15:47:32Z Inference acceleration of large language models Zhang, Boyu Mao Kezhi School of Electrical and Electronic Engineering EKZMao@ntu.edu.sg Computer and Information Science Large language model Quantization Approximate computation Self-attention Transformer This dissertation delves into the challenges and bottlenecks faced by current large language models during inference from three core perspectives: data, model, and system. Through meticulous research, key factors impacting inference speed are identified, encompassing data processing efficiency, model structure complexity, and system resource allocation and utilization. Building on this foundation, I review and interpret previous research in this field, systematically summarizing their core ideas, implementation pathways, and achievements. By deeply analyzing these studies, it not only highlight their respective strengths and weaknesses but also propose targeted improvement suggestions in line with current technological trends. Master's degree 2024-12-12T02:30:18Z 2024-12-12T02:30:18Z 2024 Thesis-Master by Coursework Zhang, B. (2024). Inference acceleration of large language models. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181660 https://hdl.handle.net/10356/181660 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Large language model Quantization Approximate computation Self-attention Transformer |
spellingShingle |
Computer and Information Science Large language model Quantization Approximate computation Self-attention Transformer Zhang, Boyu Inference acceleration of large language models |
description |
This dissertation delves into the challenges and bottlenecks faced by current
large language models during inference from three core perspectives: data, model,
and system. Through meticulous research, key factors impacting inference speed
are identified, encompassing data processing efficiency, model structure complexity,
and system resource allocation and utilization. Building on this foundation,
I review and interpret previous research in this field, systematically summarizing
their core ideas, implementation pathways, and achievements. By deeply analyzing
these studies, it not only highlight their respective strengths and weaknesses
but also propose targeted improvement suggestions in line with current technological
trends. |
author2 |
Mao Kezhi |
author_facet |
Mao Kezhi Zhang, Boyu |
format |
Thesis-Master by Coursework |
author |
Zhang, Boyu |
author_sort |
Zhang, Boyu |
title |
Inference acceleration of large language models |
title_short |
Inference acceleration of large language models |
title_full |
Inference acceleration of large language models |
title_fullStr |
Inference acceleration of large language models |
title_full_unstemmed |
Inference acceleration of large language models |
title_sort |
inference acceleration of large language models |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/181660 |
_version_ |
1819112985978732544 |