Towards faster inference of transformers: Strategies for accelerating decoding processes

This thesis delves into the acceleration and optimization of Transformer inference, a subject of increasing importance with the emergence of Large Language Models (LLMs). The study primarily addresses the challenges posed by two inherent properties of Transformers during inference: the quadratic com...

Full description

Saved in:
Bibliographic Details
Main Author: DU, Cunxiao
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll/613
https://ink.library.smu.edu.sg/context/etd_coll/article/1611/viewcontent/GPIS_AY2019_PhD_CunxiaoDu.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English