Inference acceleration of large language models

This dissertation delves into the challenges and bottlenecks faced by current large language models during inference from three core perspectives: data, model, and system. Through meticulous research, key factors impacting inference speed are identified, encompassing data processing efficiency, m...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Zhang, Boyu
مؤلفون آخرون:	Mao Kezhi
التنسيق:	Thesis-Master by Coursework
اللغة:	English
منشور في:	Nanyang Technological University 2024
الموضوعات:	Computer and Information Science Large language model Quantization Approximate computation Self-attention Transformer
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/181660
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

الانترنت

https://hdl.handle.net/10356/181660

Inference acceleration of large language models

الانترنت

مواد مشابهة