Quantifying combinatorial capababilities of image-generating AI
This study investigates the combinatorial capabilities of state-of-the-art generative AI models—Midjourney, DALL-E, and Stable Diffusion—when tasked with generating images containing underrepresented object combinations. Utilizing the LAION400M dataset as a proxy for the models' training data...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181225 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-181225 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1812252024-11-18T06:42:22Z Quantifying combinatorial capababilities of image-generating AI Poon, Wei Kang Li Boyang College of Computing and Data Science boyang.li@ntu.edu.sg Computer and Information Science This study investigates the combinatorial capabilities of state-of-the-art generative AI models—Midjourney, DALL-E, and Stable Diffusion—when tasked with generating images containing underrepresented object combinations. Utilizing the LAION400M dataset as a proxy for the models' training data, I analyzed the distribution of object occurrences and found a long-tailed distribution, with a few objects appearing frequently and many appearing rarely. I categorized object combinations into four quadrants based on frequency and association strength and crafted prompts to include these combinations. The generated images were evaluated using automated metrics: CLIPScore, Aesthetic Score, and ImageReward. My findings reveal that as the number of objects in a prompt increases, the performance of the models declines, regardless of the objects' frequency or association in the training data. This suggests that the complexity introduced by multiple objects poses a greater challenge than the rarity of individual objects. The study highlights the need for training datasets that better represent diverse object combinations and attributes to enhance the generative capabilities of AI models in complex scenes Bachelor's degree 2024-11-18T06:42:21Z 2024-11-18T06:42:21Z 2024 Final Year Project (FYP) Poon, W. K. (2024). Quantifying combinatorial capababilities of image-generating AI. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181225 https://hdl.handle.net/10356/181225 en SCSE23-0865 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science |
spellingShingle |
Computer and Information Science Poon, Wei Kang Quantifying combinatorial capababilities of image-generating AI |
description |
This study investigates the combinatorial capabilities of state-of-the-art generative AI
models—Midjourney, DALL-E, and Stable Diffusion—when tasked with generating
images containing underrepresented object combinations. Utilizing the LAION400M dataset as a proxy for the models' training data, I analyzed the distribution of
object occurrences and found a long-tailed distribution, with a few objects appearing
frequently and many appearing rarely. I categorized object combinations into four
quadrants based on frequency and association strength and crafted prompts to include
these combinations. The generated images were evaluated using automated metrics:
CLIPScore, Aesthetic Score, and ImageReward. My findings reveal that as the
number of objects in a prompt increases, the performance of the models declines,
regardless of the objects' frequency or association in the training data. This suggests
that the complexity introduced by multiple objects poses a greater challenge than the
rarity of individual objects. The study highlights the need for training datasets that
better represent diverse object combinations and attributes to enhance the generative
capabilities of AI models in complex scenes |
author2 |
Li Boyang |
author_facet |
Li Boyang Poon, Wei Kang |
format |
Final Year Project |
author |
Poon, Wei Kang |
author_sort |
Poon, Wei Kang |
title |
Quantifying combinatorial capababilities of image-generating AI |
title_short |
Quantifying combinatorial capababilities of image-generating AI |
title_full |
Quantifying combinatorial capababilities of image-generating AI |
title_fullStr |
Quantifying combinatorial capababilities of image-generating AI |
title_full_unstemmed |
Quantifying combinatorial capababilities of image-generating AI |
title_sort |
quantifying combinatorial capababilities of image-generating ai |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/181225 |
_version_ |
1816858968317755392 |