Intriguing properties of data attribution on diffusion models

Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHENG, Xiaosen, PANG, Tianyu, DU, Chao, JIANG, Jing, LIN, Min
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2024
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/9271
https://ink.library.smu.edu.sg/context/sis_research/article/10271/viewcontent/2311.00500v2.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-10271
record_format dspace
spelling sg-smu-ink.sis_research-102712024-10-17T07:30:24Z Intriguing properties of data attribution on diffusion models ZHENG, Xiaosen PANG, Tianyu DU, Chao JIANG, Jing LIN, Min Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or credited. Several theoretically motivated methods have been proposed to implement data attribution, in an effort to improve the trade-off between computational scalability and effectiveness. In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and CelebA, as well as a Stable Diffusion model LoRA-finetuned on ArtBench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation. Our work presents a significantly more efficient approach for attributing diffusion models, while the unexpected findings suggest that at least in non-convex settings, constructions guided by theoretical assumptions may lead to inferior attribution performance. The code is available at https://github.com/sail-sg/D-TRAK. 2024-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9271 https://ink.library.smu.edu.sg/context/sis_research/article/10271/viewcontent/2311.00500v2.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
spellingShingle Databases and Information Systems
ZHENG, Xiaosen
PANG, Tianyu
DU, Chao
JIANG, Jing
LIN, Min
Intriguing properties of data attribution on diffusion models
description Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or credited. Several theoretically motivated methods have been proposed to implement data attribution, in an effort to improve the trade-off between computational scalability and effectiveness. In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and CelebA, as well as a Stable Diffusion model LoRA-finetuned on ArtBench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation. Our work presents a significantly more efficient approach for attributing diffusion models, while the unexpected findings suggest that at least in non-convex settings, constructions guided by theoretical assumptions may lead to inferior attribution performance. The code is available at https://github.com/sail-sg/D-TRAK.
format text
author ZHENG, Xiaosen
PANG, Tianyu
DU, Chao
JIANG, Jing
LIN, Min
author_facet ZHENG, Xiaosen
PANG, Tianyu
DU, Chao
JIANG, Jing
LIN, Min
author_sort ZHENG, Xiaosen
title Intriguing properties of data attribution on diffusion models
title_short Intriguing properties of data attribution on diffusion models
title_full Intriguing properties of data attribution on diffusion models
title_fullStr Intriguing properties of data attribution on diffusion models
title_full_unstemmed Intriguing properties of data attribution on diffusion models
title_sort intriguing properties of data attribution on diffusion models
publisher Institutional Knowledge at Singapore Management University
publishDate 2024
url https://ink.library.smu.edu.sg/sis_research/9271
https://ink.library.smu.edu.sg/context/sis_research/article/10271/viewcontent/2311.00500v2.pdf
_version_ 1814047928092721152