Towards faster inference of transformers: Strategies for accelerating decoding processes
This thesis delves into the acceleration and optimization of Transformer inference, a subject of increasing importance with the emergence of Large Language Models (LLMs). The study primarily addresses the challenges posed by two inherent properties of Transformers during inference: the quadratic com...
Saved in:
Main Author: | DU, Cunxiao |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/etd_coll/613 https://ink.library.smu.edu.sg/context/etd_coll/article/1611/viewcontent/GPIS_AY2019_PhD_CunxiaoDu.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Similar Items
-
Eye-tracking monitoring based on PMUT arrays
by: SUN, Sheng, et al.
Published: (2021) -
Learning and evaluating Chinese idiom embeddings
by: TAN, Minghuan, et al.
Published: (2021) -
Sound and complete witnesses for template-based verification of LTL properties on polynomial programs
by: CHATTERJEE, Krishnendu, et al.
Published: (2024) -
ReEvo: Large language models as hyper-heuristics with reflective evolution
by: YE, Haoran, et al.
Published: (2024) -
Multi-head attention graph convolutional network model: End-to-end entity and relation joint extraction based on multi-head attention graph convolutional network
by: TAO, Zhihua, et al.
Published: (2023)