Quantifying combinatorial capababilities of image-generating AI

This study investigates the combinatorial capabilities of state-of-the-art generative AI models—Midjourney, DALL-E, and Stable Diffusion—when tasked with generating images containing underrepresented object combinations. Utilizing the LAION400M dataset as a proxy for the models' training data...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Poon, Wei Kang
مؤلفون آخرون: Li Boyang
التنسيق: Final Year Project
اللغة:English
منشور في: Nanyang Technological University 2024
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/181225
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Nanyang Technological University
اللغة: English
id sg-ntu-dr.10356-181225
record_format dspace
spelling sg-ntu-dr.10356-1812252024-11-18T06:42:22Z Quantifying combinatorial capababilities of image-generating AI Poon, Wei Kang Li Boyang College of Computing and Data Science boyang.li@ntu.edu.sg Computer and Information Science This study investigates the combinatorial capabilities of state-of-the-art generative AI models—Midjourney, DALL-E, and Stable Diffusion—when tasked with generating images containing underrepresented object combinations. Utilizing the LAION400M dataset as a proxy for the models' training data, I analyzed the distribution of object occurrences and found a long-tailed distribution, with a few objects appearing frequently and many appearing rarely. I categorized object combinations into four quadrants based on frequency and association strength and crafted prompts to include these combinations. The generated images were evaluated using automated metrics: CLIPScore, Aesthetic Score, and ImageReward. My findings reveal that as the number of objects in a prompt increases, the performance of the models declines, regardless of the objects' frequency or association in the training data. This suggests that the complexity introduced by multiple objects poses a greater challenge than the rarity of individual objects. The study highlights the need for training datasets that better represent diverse object combinations and attributes to enhance the generative capabilities of AI models in complex scenes Bachelor's degree 2024-11-18T06:42:21Z 2024-11-18T06:42:21Z 2024 Final Year Project (FYP) Poon, W. K. (2024). Quantifying combinatorial capababilities of image-generating AI. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181225 https://hdl.handle.net/10356/181225 en SCSE23-0865 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
spellingShingle Computer and Information Science
Poon, Wei Kang
Quantifying combinatorial capababilities of image-generating AI
description This study investigates the combinatorial capabilities of state-of-the-art generative AI models—Midjourney, DALL-E, and Stable Diffusion—when tasked with generating images containing underrepresented object combinations. Utilizing the LAION400M dataset as a proxy for the models' training data, I analyzed the distribution of object occurrences and found a long-tailed distribution, with a few objects appearing frequently and many appearing rarely. I categorized object combinations into four quadrants based on frequency and association strength and crafted prompts to include these combinations. The generated images were evaluated using automated metrics: CLIPScore, Aesthetic Score, and ImageReward. My findings reveal that as the number of objects in a prompt increases, the performance of the models declines, regardless of the objects' frequency or association in the training data. This suggests that the complexity introduced by multiple objects poses a greater challenge than the rarity of individual objects. The study highlights the need for training datasets that better represent diverse object combinations and attributes to enhance the generative capabilities of AI models in complex scenes
author2 Li Boyang
author_facet Li Boyang
Poon, Wei Kang
format Final Year Project
author Poon, Wei Kang
author_sort Poon, Wei Kang
title Quantifying combinatorial capababilities of image-generating AI
title_short Quantifying combinatorial capababilities of image-generating AI
title_full Quantifying combinatorial capababilities of image-generating AI
title_fullStr Quantifying combinatorial capababilities of image-generating AI
title_full_unstemmed Quantifying combinatorial capababilities of image-generating AI
title_sort quantifying combinatorial capababilities of image-generating ai
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/181225
_version_ 1816858968317755392