Inference acceleration of large language models

Inference acceleration of large language models

This dissertation delves into the challenges and bottlenecks faced by current large language models during inference from three core perspectives: data, model, and system. Through meticulous research, key factors impacting inference speed are identified, encompassing data processing efficiency, m...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Zhang, Boyu
مؤلفون آخرون:	Mao Kezhi
التنسيق:	Thesis-Master by Coursework
اللغة:	English
منشور في:	Nanyang Technological University 2024
الموضوعات:	Computer and Information Science Large language model Quantization Approximate computation Self-attention Transformer
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/181660
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

مواد مشابهة

Enhancing online safety: leveraging large language models for community moderation in Singlish dialect
بواسطة: Goh, Zheng Ying
منشور في: (2024)

Optimizing large language model inference
بواسطة: Shao, Siyang
منشور في: (2025)

Efficient inference offloading for mixture-of-experts large language models in internet of medical things
بواسطة: Yuan, Xiaoming, وآخرون
منشور في: (2024)

Heuristic development in the use of large language models for materials science
بواسطة: Chye, Vincent Zhen Guang
منشور في: (2024)

QuantfolioX: portfolio management application using large language model technology
بواسطة: Teo, Charlotte Xuan Qin
منشور في: (2024)

AIView: helping students prepare for software engineering technical interviews using large language models
بواسطة: Prasad Shubhangam Rahesh
منشور في: (2025)

Bias problems in large language models and how to mitigate them
بواسطة: Ong, Adrian Zhi Ying
منشور في: (2024)

Multi-modal large language model for drug development
بواسطة: Su, Gaoyang
منشور في: (2025)

Graph data query and visualization via large language models
بواسطة: Lim, Kian Yew
منشور في: (2025)

Reliable, efficient and light distance computation on high-dimensional vectors
بواسطة: Gao, Jianyang
منشور في: (2025)

Test case generation from specifications using natural language processing and large language models
بواسطة: Leung, Andrew Chun Kit
منشور في: (2025)

Machine translation of multilingual cybersecurity reports with large language models
بواسطة: Chua, Jaedon Boon Chong
منشور في: (2025)

Solution generation for university math problems using large language models
بواسطة: Wirja, Louis
منشور في: (2024)

Leveraging large language models and BERT for log parsing and anomaly detection
بواسطة: Zhou, Yihan, وآخرون
منشور في: (2024)

Punctuation restoration for speech transcripts using large language models
بواسطة: Liu, Changsong
منشور في: (2024)

MCQGen: a large language model-driven MCQ generator for personalized learning
بواسطة: Hang, Ching Nam, وآخرون
منشور في: (2024)

Exploring large language model (LLM) impacts on building energy applications
بواسطة: Wu, Mian
منشور في: (2025)

Model-driven smart contract generation leveraging pretrained large language models
بواسطة: Jiang, Qinbo
منشور في: (2024)

Financial trading in the digital age: the integration of large language model and reinforcement learning
بواسطة: Zhao, Lingxuan
منشور في: (2024)

Framework to evaluate and test defences against hallucination in large language model
بواسطة: Pan, Johnny Shi Han
منشور في: (2024)

Integrating evolutionary algorithms with large language models for enhanced problem solving
بواسطة: Hirashima Shunya
منشور في: (2025)

Event extraction and beyond: from conventional NLP to large language models
بواسطة: Zhou, Hanzhang
منشور في: (2025)

Investigating large language model pruning techniques
بواسطة: Cheng, Yixiao
منشور في: (2025)

Leveraging large language models for effective user interaction via conversations
بواسطة: Zhang, Mengao
منشور في: (2024)

Benchmarking large multimodal language models for fine-grained video understanding
بواسطة: Wu, Xinran
منشور في: (2025)

Transcription software with language model integration
بواسطة: Najah Ismail
منشور في: (2024)

Personality prediction based on large language models
بواسطة: Wee, Jewel Xin Yu
منشور في: (2024)

Large language model (LLM) with retrieve-augmented generation (RAG) for legal case research
بواسطة: Liu, Zihao
منشور في: (2024)

Empowering natural language processing in low-resource regimes
بواسطة: Feng, Zijian
منشور في: (2025)

Clean-label backdoor attack and defense: an examination of language model vulnerability
بواسطة: Zhao, Shuai, وآخرون
منشور في: (2025)

Genixer : Empowering multimodal Large Language Models as a powerful data generator
بواسطة: ZHAO, Henry Hengyuan, وآخرون
منشور في: (2024)

Time series task extraction from large language models
بواسطة: Toh, Leong Seng
منشور في: (2024)

Don’t just say “I don’t know”! Self-aligning Large Language Models for responding to unknown questions with explanations
بواسطة: DENG, Yang, وآخرون
منشور في: (2024)

A comprehensive study on optimization techniques for AMR robots recognition models
بواسطة: Zheng, Hao Peng
منشور في: (2025)

Transforming object-oriented Java education: harnessing large language models for enhanced learning
بواسطة: Teo, Brian Hong Guan
منشور في: (2025)

Skin beauty adviser assistant based on large language model and computer vision
بواسطة: Jiang, Yuwei
منشور في: (2025)

Large language model powered agents in the web
بواسطة: DENG, Yang, وآخرون
منشور في: (2024)

Collaborative cross-modal fusion with Large Language Model for recommendation
بواسطة: LIU, Zhongzhou, وآخرون
منشور في: (2024)

An enhanced deep reinforcement learning ensemble empowered by large language model
بواسطة: Li, Xinyi
منشور في: (2024)

Large language model powered agents for information retrieval
بواسطة: ZHANG, An, وآخرون
منشور في: (2024)