LLM hallucination study

Large Language Models (LLMs) exhibit impressive generative capabilities but often produce hallucinations—outputs that are factually incorrect, misleading, or entirely fabricated. These hallucinations pose significant challenges in high-stakes applications such as medical diagnosis, legal reaso...

Full description

Saved in:
Bibliographic Details
Main Author: Potdar, Prateek Anish
Other Authors: Jun Zhao
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2025
Subjects:
LLM
RAG
Online Access:https://hdl.handle.net/10356/183825
Tags: Add Tag
No Tags, Be the first to tag this record!
id sg-ntu-dr.10356-183825
record_format dspace
spelling sg-ntu-dr.10356-1838252025-04-17T01:52:28Z LLM hallucination study Potdar, Prateek Anish Jun Zhao College of Computing and Data Science junzhao@ntu.edu.sg Computer and Information Science LLM Hallucination RAG Large Language Models (LLMs) exhibit impressive generative capabilities but often produce hallucinations—outputs that are factually incorrect, misleading, or entirely fabricated. These hallucinations pose significant challenges in high-stakes applications such as medical diagnosis, legal reasoning, financial analysis, and scientific research, where factual accuracy is critical. As LLMs become increasingly integrated into real-world systems, mitigating hallucinations is essential to ensuring reliable, trustworthy, and ethically sound AI deployments. Without effective strategies to reduce hallucinations, AI-generated content risks contributing to misinformation, undermining user trust, and limiting the adoption of LLMs in professional domains. My report investigates techniques to reduce hallucinations through systematic experimentation on Meta’s LLaMa model, a state-of-the-art open-source LLM. Specifically, I explored the impact of key generative parameters, including temperature scaling, top-k sampling, and retrieval- augmented generation, on factuality and coherence. These parameters play a crucial role in balancing response creativity and accuracy, directly influencing the probability of hallucinated content. By carefully tuning these hyperparameters and integrating external knowledge retrieval, I aimed to assess how different configurations affect the reliability of LLaMa’s generated responses. I systematically evaluated the effectiveness of these mitigation techniques using factuality scoring and response coherence analysis. Factuality was assessed by measuring the alignment of generated responses with authoritative sources, while coherence analysis examined the logical consistency and contextual appropriateness of outputs. Results from these experiments provide quantitative insights into the trade-offs between factual reliability, creativity, and response variability, offering practical guidelines for optimizing LLMs across different use cases. Bachelor's degree 2025-04-17T01:52:27Z 2025-04-17T01:52:27Z 2025 Final Year Project (FYP) Potdar, P. A. (2025). LLM hallucination study. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/183825 https://hdl.handle.net/10356/183825 en CCDS24-0767 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
LLM
Hallucination
RAG
spellingShingle Computer and Information Science
LLM
Hallucination
RAG
Potdar, Prateek Anish
LLM hallucination study
description Large Language Models (LLMs) exhibit impressive generative capabilities but often produce hallucinations—outputs that are factually incorrect, misleading, or entirely fabricated. These hallucinations pose significant challenges in high-stakes applications such as medical diagnosis, legal reasoning, financial analysis, and scientific research, where factual accuracy is critical. As LLMs become increasingly integrated into real-world systems, mitigating hallucinations is essential to ensuring reliable, trustworthy, and ethically sound AI deployments. Without effective strategies to reduce hallucinations, AI-generated content risks contributing to misinformation, undermining user trust, and limiting the adoption of LLMs in professional domains. My report investigates techniques to reduce hallucinations through systematic experimentation on Meta’s LLaMa model, a state-of-the-art open-source LLM. Specifically, I explored the impact of key generative parameters, including temperature scaling, top-k sampling, and retrieval- augmented generation, on factuality and coherence. These parameters play a crucial role in balancing response creativity and accuracy, directly influencing the probability of hallucinated content. By carefully tuning these hyperparameters and integrating external knowledge retrieval, I aimed to assess how different configurations affect the reliability of LLaMa’s generated responses. I systematically evaluated the effectiveness of these mitigation techniques using factuality scoring and response coherence analysis. Factuality was assessed by measuring the alignment of generated responses with authoritative sources, while coherence analysis examined the logical consistency and contextual appropriateness of outputs. Results from these experiments provide quantitative insights into the trade-offs between factual reliability, creativity, and response variability, offering practical guidelines for optimizing LLMs across different use cases.
author2 Jun Zhao
author_facet Jun Zhao
Potdar, Prateek Anish
format Final Year Project
author Potdar, Prateek Anish
author_sort Potdar, Prateek Anish
title LLM hallucination study
title_short LLM hallucination study
title_full LLM hallucination study
title_fullStr LLM hallucination study
title_full_unstemmed LLM hallucination study
title_sort llm hallucination study
publisher Nanyang Technological University
publishDate 2025
url https://hdl.handle.net/10356/183825
_version_ 1831146404399022080