Towards faster inference of transformers: Strategies for accelerating decoding processes

Towards faster inference of transformers: Strategies for accelerating decoding processes

This thesis delves into the acceleration and optimization of Transformer inference, a subject of increasing importance with the emergence of Large Language Models (LLMs). The study primarily addresses the challenges posed by two inherent properties of Transformers during inference: the quadratic com...

Full description

Saved in:

Bibliographic Details
Main Author:	DU, Cunxiao
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Large Language Model Neural Network Language Processing General AI Artificial Intelligence and Robotics Programming Languages and Compilers
Online Access:	https://ink.library.smu.edu.sg/etd_coll/613 https://ink.library.smu.edu.sg/context/etd_coll/article/1611/viewcontent/GPIS_AY2019_PhD_CunxiaoDu.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Eye-tracking monitoring based on PMUT arrays
by: SUN, Sheng, et al.
Published: (2021)

Learning and evaluating Chinese idiom embeddings
by: TAN, Minghuan, et al.
Published: (2021)

Sound and complete witnesses for template-based verification of LTL properties on polynomial programs
by: CHATTERJEE, Krishnendu, et al.
Published: (2024)

ReEvo: Large language models as hyper-heuristics with reflective evolution
by: YE, Haoran, et al.
Published: (2024)

Multi-head attention graph convolutional network model: End-to-end entity and relation joint extraction based on multi-head attention graph convolutional network
by: TAO, Zhihua, et al.
Published: (2023)

Decompiling x86 Deep Neural Network executables
by: LIU, Zhibo, et al.
Published: (2023)

Are we ready to embrace generative AI for software Q&A?
by: XU, Bowen, et al.
Published: (2023)

Hallucination detection: Robustly discerning reliable answers in Large Language Models
by: CHEN, Yuyuan, et al.
Published: (2023)

R2F: A general retrieval, reading and fusion framework for document-level natural language inference
by: WANG, Hao, et al.
Published: (2022)

AI coders are among us : Rethinking programming language grammar towards efficient code generation
by: SUN Zhensu,, et al.
Published: (2024)

Leveraging large language models for effective user interaction via conversations
by: Zhang, Mengao
Published: (2024)

Understanding the Genetic Makeup of Linux Device Drivers
by: Tschudin, Peter Senna, et al.
Published: (2013)

Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models
by: WANG, Lei, et al.
Published: (2023)

Thoughts to target : enhance planning for target-driven conversation
by: ZHENG, Zhonghua, et al.
Published: (2024)

Program evaluation for Easy, C, Pascal programming languages
by: Garcia, Arnaldo, et al.
Published: (1989)

Laughter emotion recognition using gestures
by: De Jesus, Paulina Catya S.
Published: (2014)

Sound and complete certificates for quantitative termination analysis of probabilistic programs
by: CHATTERJEE, Krishnendu, et al.
Published: (2022)

Learning control policies for stochastic systems with reach-avoid guarantees
by: ZIKELIC, Dorde, et al.
Published: (2023)

Attack prompt generation for red teaming and defending large language models
by: DENG, Boyi, et al.
Published: (2023)

A Prolog-based definition of an entity-relationship language
by: CHAN, H., et al.
Published: (1993)

Revisiting masked auto-encoders for ECG-language representation learning
by: PHAM, Hung Manh, et al.
Published: (2024)

Prompting and evaluating large language models for proactive dialogues: Clarification, target-guided, and non-collaboration
by: DENG, Yang, et al.
Published: (2023)

ClusterPrompt: Cluster semantic enhanced prompt learning for new intent discovery
by: LIANG, Jinggui, et al.
Published: (2023)

Towards understanding the faults of JavaScript-based deep learning systems
by: QUAN, Lili, et al.
Published: (2022)

Towards expressive specification and efficient model checking
by: DONG, Jin Song, et al.
Published: (2009)

Towards using concurrent Java API correctly
by: LIU, Shuang, et al.
Published: (2016)

Large language model powered agents for information retrieval
by: ZHANG, An, et al.
Published: (2024)

Large language model powered agents in the web
by: DENG, Yang, et al.
Published: (2024)

Effectiveness of physical robot versus robot simulator in teaching introductory programming
by: KURNIAWAN, Oka, et al.
Published: (2018)

Tsukiden Global Solutions intensive C training: C workbook module and machine problems
by: Ong, Erwin Donovan
Published: (2011)

Towards LLM-based fact verification on news claims with a hierarchical step-by-step prompting method
by: ZHANG, Xuan, et al.
Published: (2023)

CoSec : On-the-Fly security hardening of code LLMs via supervised co-decoding
by: LI, Dong, et al.
Published: (2024)

She elicits requirements and he tests: Software engineering gender bias in large language models
by: TREUDE, Christoph, et al.
Published: (2023)

Challenges in analyzing software documentation in Portuguese
by: TREUDE, Christoph, et al.
Published: (2015)

Disentangling transformer language models as superposed topic models
by: LIM, Jia Peng, et al.
Published: (2023)

A logistic regression and linear programming approach for multi-skill staffing optimization in call centers
by: TA, Thuy Anh, et al.
Published: (2022)

A comparison of YAPLU against the C programming language
by: Ng, Spencer, et al.
Published: (2011)

Planning like human : A dual-process framework for dialogue planning
by: HE, Tao, et al.
Published: (2024)

WatME: Towards lossless watermarking through lexical redundancy
by: CHEN, Liang, et al.
Published: (2024)

Large language models for qualitative research in software engineering: exploring opportunities and challenges
by: BANO, Muneera, et al.
Published: (2024)