Inference acceleration of large language models
This dissertation delves into the challenges and bottlenecks faced by current large language models during inference from three core perspectives: data, model, and system. Through meticulous research, key factors impacting inference speed are identified, encompassing data processing efficiency, m...
Saved in:
主要作者: | Zhang, Boyu |
---|---|
其他作者: | Mao Kezhi |
格式: | Thesis-Master by Coursework |
語言: | English |
出版: |
Nanyang Technological University
2024
|
主題: | |
在線閱讀: | https://hdl.handle.net/10356/181660 |
標簽: |
添加標簽
沒有標簽, 成為第一個標記此記錄!
|
相似書籍
-
Enhancing online safety: leveraging large language models for community moderation in Singlish dialect
由: Goh, Zheng Ying
出版: (2024) -
Optimizing large language model inference
由: Shao, Siyang
出版: (2025) -
Efficient inference offloading for mixture-of-experts large language models in internet of medical things
由: Yuan, Xiaoming, et al.
出版: (2024) -
Heuristic development in the use of large language models for materials science
由: Chye, Vincent Zhen Guang
出版: (2024) -
QuantfolioX: portfolio management application using large language model technology
由: Teo, Charlotte Xuan Qin
出版: (2024)