Few-shot learner parameterization by diffusion time-steps

Even when using large multi-modal foundation models, few-shot learning is still challenging—if there is no proper inductive bias, it is nearly impossible to keep the nuanced class attributes while removing the visually prominent attributes that spuriously correlate with class labels. To this end, we...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	YUE, Zhongqi, ZHOU, Pan, HONG, Richang, ZHANG, Hanwang, SUN Qianru
التنسيق:	text
اللغة:	English
منشور في:	Institutional Knowledge at Singapore Management University 2024
الموضوعات:	Graphics and Human Computer Interfaces
الوصول للمادة أونلاين:	https://ink.library.smu.edu.sg/sis_research/9019 https://ink.library.smu.edu.sg/context/sis_research/article/10022/viewcontent/2024_CVPR_few_shot.pdf
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Singapore Management University
اللغة:	English

id	sg-smu-ink.sis_research-10022
record_format	dspace
spelling	sg-smu-ink.sis_research-100222024-07-25T08:07:32Z Few-shot learner parameterization by diffusion time-steps YUE, Zhongqi ZHOU, Pan HONG, Richang ZHANG, Hanwang SUN Qianru, Even when using large multi-modal foundation models, few-shot learning is still challenging—if there is no proper inductive bias, it is nearly impossible to keep the nuanced class attributes while removing the visually prominent attributes that spuriously correlate with class labels. To this end, we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced class attributes, i.e., as the forward diffusion adds noise to an image at each time-step, nuanced attributes are usually lost at an earlier time-step than the spurious attributes that are visually prominent. Building on this, we propose Time-step Few-shot (TiF) learner. We train class-specific low-rank adapters for a text-conditioned DM to make up for the lost attributes, such that images can be accurately reconstructed from their noisy ones given a prompt. Hence, at a small time-step, the adapter and prompt are essentially a parameterization of only the nuanced class attributes. For a test image, we can use the parameterization to only extract the nuanced class attributes for classification. TiF learner significantly outperforms OpenCLIP and its adapters on a variety of fine-grained and customized few-shot learning tasks. Codes are in https://github.com/yue-zhongqi/tif. 2024-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9019 https://ink.library.smu.edu.sg/context/sis_research/article/10022/viewcontent/2024_CVPR_few_shot.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Graphics and Human Computer Interfaces
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Graphics and Human Computer Interfaces
spellingShingle	Graphics and Human Computer Interfaces YUE, Zhongqi ZHOU, Pan HONG, Richang ZHANG, Hanwang SUN Qianru, Few-shot learner parameterization by diffusion time-steps
description	Even when using large multi-modal foundation models, few-shot learning is still challenging—if there is no proper inductive bias, it is nearly impossible to keep the nuanced class attributes while removing the visually prominent attributes that spuriously correlate with class labels. To this end, we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced class attributes, i.e., as the forward diffusion adds noise to an image at each time-step, nuanced attributes are usually lost at an earlier time-step than the spurious attributes that are visually prominent. Building on this, we propose Time-step Few-shot (TiF) learner. We train class-specific low-rank adapters for a text-conditioned DM to make up for the lost attributes, such that images can be accurately reconstructed from their noisy ones given a prompt. Hence, at a small time-step, the adapter and prompt are essentially a parameterization of only the nuanced class attributes. For a test image, we can use the parameterization to only extract the nuanced class attributes for classification. TiF learner significantly outperforms OpenCLIP and its adapters on a variety of fine-grained and customized few-shot learning tasks. Codes are in https://github.com/yue-zhongqi/tif.
format	text
author	YUE, Zhongqi ZHOU, Pan HONG, Richang ZHANG, Hanwang SUN Qianru,
author_facet	YUE, Zhongqi ZHOU, Pan HONG, Richang ZHANG, Hanwang SUN Qianru,
author_sort	YUE, Zhongqi
title	Few-shot learner parameterization by diffusion time-steps
title_short	Few-shot learner parameterization by diffusion time-steps
title_full	Few-shot learner parameterization by diffusion time-steps
title_fullStr	Few-shot learner parameterization by diffusion time-steps
title_full_unstemmed	Few-shot learner parameterization by diffusion time-steps
title_sort	few-shot learner parameterization by diffusion time-steps
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9019 https://ink.library.smu.edu.sg/context/sis_research/article/10022/viewcontent/2024_CVPR_few_shot.pdf
_version_	1814047694148075520

Few-shot learner parameterization by diffusion time-steps

مواد مشابهة