Quantifying combinatorial capababilities of image-generating AI

This study investigates the combinatorial capabilities of state-of-the-art generative AI models—Midjourney, DALL-E, and Stable Diffusion—when tasked with generating images containing underrepresented object combinations. Utilizing the LAION400M dataset as a proxy for the models' training data...

全面介紹

Saved in:
書目詳細資料
主要作者: Poon, Wei Kang
其他作者: Li Boyang
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2024
主題:
在線閱讀:https://hdl.handle.net/10356/181225
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Nanyang Technological University
語言: English
實物特徵
總結:This study investigates the combinatorial capabilities of state-of-the-art generative AI models—Midjourney, DALL-E, and Stable Diffusion—when tasked with generating images containing underrepresented object combinations. Utilizing the LAION400M dataset as a proxy for the models' training data, I analyzed the distribution of object occurrences and found a long-tailed distribution, with a few objects appearing frequently and many appearing rarely. I categorized object combinations into four quadrants based on frequency and association strength and crafted prompts to include these combinations. The generated images were evaluated using automated metrics: CLIPScore, Aesthetic Score, and ImageReward. My findings reveal that as the number of objects in a prompt increases, the performance of the models declines, regardless of the objects' frequency or association in the training data. This suggests that the complexity introduced by multiple objects poses a greater challenge than the rarity of individual objects. The study highlights the need for training datasets that better represent diverse object combinations and attributes to enhance the generative capabilities of AI models in complex scenes