Towards faster inference of transformers: Strategies for accelerating decoding processes
This thesis delves into the acceleration and optimization of Transformer inference, a subject of increasing importance with the emergence of Large Language Models (LLMs). The study primarily addresses the challenges posed by two inherent properties of Transformers during inference: the quadratic com...
Saved in:
主要作者: | |
---|---|
格式: | text |
語言: | English |
出版: |
Institutional Knowledge at Singapore Management University
2024
|
主題: | |
在線閱讀: | https://ink.library.smu.edu.sg/etd_coll/613 https://ink.library.smu.edu.sg/context/etd_coll/article/1611/viewcontent/GPIS_AY2019_PhD_CunxiaoDu.pdf |
標簽: |
添加標簽
沒有標簽, 成為第一個標記此記錄!
|
機構: | Singapore Management University |
語言: | English |