Few-shot learner parameterization by diffusion time-steps

Even when using large multi-modal foundation models, few-shot learning is still challenging—if there is no proper inductive bias, it is nearly impossible to keep the nuanced class attributes while removing the visually prominent attributes that spuriously correlate with class labels. To this end, we...

Full description

Saved in:
Bibliographic Details
Main Authors: YUE, Zhongqi, ZHOU, Pan, HONG, Richang, ZHANG, Hanwang, SUN Qianru
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9019
https://ink.library.smu.edu.sg/context/sis_research/article/10022/viewcontent/2024_CVPR_few_shot.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10022
record_format dspace
spelling sg-smu-ink.sis_research-100222024-07-25T08:07:32Z Few-shot learner parameterization by diffusion time-steps YUE, Zhongqi ZHOU, Pan HONG, Richang ZHANG, Hanwang SUN Qianru, Even when using large multi-modal foundation models, few-shot learning is still challenging—if there is no proper inductive bias, it is nearly impossible to keep the nuanced class attributes while removing the visually prominent attributes that spuriously correlate with class labels. To this end, we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced class attributes, i.e., as the forward diffusion adds noise to an image at each time-step, nuanced attributes are usually lost at an earlier time-step than the spurious attributes that are visually prominent. Building on this, we propose Time-step Few-shot (TiF) learner. We train class-specific low-rank adapters for a text-conditioned DM to make up for the lost attributes, such that images can be accurately reconstructed from their noisy ones given a prompt. Hence, at a small time-step, the adapter and prompt are essentially a parameterization of only the nuanced class attributes. For a test image, we can use the parameterization to only extract the nuanced class attributes for classification. TiF learner significantly outperforms OpenCLIP and its adapters on a variety of fine-grained and customized few-shot learning tasks. Codes are in https://github.com/yue-zhongqi/tif. 2024-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9019 https://ink.library.smu.edu.sg/context/sis_research/article/10022/viewcontent/2024_CVPR_few_shot.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Graphics and Human Computer Interfaces
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Graphics and Human Computer Interfaces
spellingShingle Graphics and Human Computer Interfaces
YUE, Zhongqi
ZHOU, Pan
HONG, Richang
ZHANG, Hanwang
SUN Qianru,
Few-shot learner parameterization by diffusion time-steps
description Even when using large multi-modal foundation models, few-shot learning is still challenging—if there is no proper inductive bias, it is nearly impossible to keep the nuanced class attributes while removing the visually prominent attributes that spuriously correlate with class labels. To this end, we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced class attributes, i.e., as the forward diffusion adds noise to an image at each time-step, nuanced attributes are usually lost at an earlier time-step than the spurious attributes that are visually prominent. Building on this, we propose Time-step Few-shot (TiF) learner. We train class-specific low-rank adapters for a text-conditioned DM to make up for the lost attributes, such that images can be accurately reconstructed from their noisy ones given a prompt. Hence, at a small time-step, the adapter and prompt are essentially a parameterization of only the nuanced class attributes. For a test image, we can use the parameterization to only extract the nuanced class attributes for classification. TiF learner significantly outperforms OpenCLIP and its adapters on a variety of fine-grained and customized few-shot learning tasks. Codes are in https://github.com/yue-zhongqi/tif.
format text
author YUE, Zhongqi
ZHOU, Pan
HONG, Richang
ZHANG, Hanwang
SUN Qianru,
author_facet YUE, Zhongqi
ZHOU, Pan
HONG, Richang
ZHANG, Hanwang
SUN Qianru,
author_sort YUE, Zhongqi
title Few-shot learner parameterization by diffusion time-steps
title_short Few-shot learner parameterization by diffusion time-steps
title_full Few-shot learner parameterization by diffusion time-steps
title_fullStr Few-shot learner parameterization by diffusion time-steps
title_full_unstemmed Few-shot learner parameterization by diffusion time-steps
title_sort few-shot learner parameterization by diffusion time-steps
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9019
https://ink.library.smu.edu.sg/context/sis_research/article/10022/viewcontent/2024_CVPR_few_shot.pdf
_version_ 1814047694148075520