Quantifying combinatorial capababilities of image-generating AI
This study investigates the combinatorial capabilities of state-of-the-art generative AI models—Midjourney, DALL-E, and Stable Diffusion—when tasked with generating images containing underrepresented object combinations. Utilizing the LAION400M dataset as a proxy for the models' training data...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/181225 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This study investigates the combinatorial capabilities of state-of-the-art generative AI
models—Midjourney, DALL-E, and Stable Diffusion—when tasked with generating
images containing underrepresented object combinations. Utilizing the LAION400M dataset as a proxy for the models' training data, I analyzed the distribution of
object occurrences and found a long-tailed distribution, with a few objects appearing
frequently and many appearing rarely. I categorized object combinations into four
quadrants based on frequency and association strength and crafted prompts to include
these combinations. The generated images were evaluated using automated metrics:
CLIPScore, Aesthetic Score, and ImageReward. My findings reveal that as the
number of objects in a prompt increases, the performance of the models declines,
regardless of the objects' frequency or association in the training data. This suggests
that the complexity introduced by multiple objects poses a greater challenge than the
rarity of individual objects. The study highlights the need for training datasets that
better represent diverse object combinations and attributes to enhance the generative
capabilities of AI models in complex scenes |
---|