Towards a more trustworthy generative artificial intelligence
In recent years, the rapid increase of available Generative Artificial Intelligence (AI) models has revolutionized various domains. From AI image generation to intelligent natural language reasoning like Open AI’s ChatGPT and Google Gemini. These models fuelled by advancements in deep learning...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181657 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In recent years, the rapid increase of available Generative Artificial Intelligence (AI) models
has revolutionized various domains. From AI image generation to intelligent natural language
reasoning like Open AI’s ChatGPT and Google Gemini. These models fuelled by advancements
in deep learning have demonstrated unprecedented capabilities in generating life-like content.
However, with these advancements comes new challenges, especially in ensuring the
robustness and reliability of such models when faced with adversarial attacks.
Adversarial attacks are attacks that exploits the vulnerabilities of Generative AI models by
subtly altering the input data, also known as perturbations, to deceive the models into making
hallucinated, distorted, or incorrect predictions.
Vision Language Models (VLM) are an example of Generative AI that leverages complex
neural architectures, combining the capabilities of both Computer Vision (CV) and Natural
Language Processing (NLP) modalities to generate captions or descriptions of images or to
produce corresponding visual content from textual prompts. In the context of VLM, adversarial
attacks can come in the form of perturbated image inputs ranging from subtle to severe using
techniques such as gaussian blur, dithering, and contrast manipulation leading to hallucinated
image captioning or description. Hallucinations refer to erroneous or implausible outputs
generated by AI models, particularly in generative tasks. In this context, hallucinations
manifest as generated images or captions that deviate significantly from the intended input.
Motivated by the need to address this vulnerability, this project systematically evaluates VLMs’
robustness under a variety of adversarial perturbations, with a focus on benchmarking
hallucinations. Existing hallucination benchmark work faces challenges, such as limited task
specificity and lack of depth in assessing VLM responses across diverse perturbations. By
addressing these gaps, this study aims to enhance our understanding of VLM limitations and
contribute to the development of more robust and reliable vision-language models, setting a
foundation for improving AI models’ resilience to real-world distortions. |
---|