Inference acceleration of large language models
This dissertation delves into the challenges and bottlenecks faced by current large language models during inference from three core perspectives: data, model, and system. Through meticulous research, key factors impacting inference speed are identified, encompassing data processing efficiency, m...
Saved in:
Main Author: | Zhang, Boyu |
---|---|
Other Authors: | Mao Kezhi |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181660 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
Enhancing online safety: leveraging large language models for community moderation in Singlish dialect
by: Goh, Zheng Ying
Published: (2024) -
Efficient inference offloading for mixture-of-experts large language models in internet of medical things
by: Yuan, Xiaoming, et al.
Published: (2024) -
Heuristic development in the use of large language models for materials science
by: Chye, Vincent Zhen Guang
Published: (2024) -
QuantfolioX: portfolio management application using large language model technology
by: Teo, Charlotte Xuan Qin
Published: (2024) -
Bias problems in large language models and how to mitigate them
by: Ong, Adrian Zhi Ying
Published: (2024)