Intriguing properties of data attribution on diffusion models
Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2024
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/9271 https://ink.library.smu.edu.sg/context/sis_research/article/10271/viewcontent/2311.00500v2.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-10271 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-102712024-10-17T07:30:24Z Intriguing properties of data attribution on diffusion models ZHENG, Xiaosen PANG, Tianyu DU, Chao JIANG, Jing LIN, Min Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or credited. Several theoretically motivated methods have been proposed to implement data attribution, in an effort to improve the trade-off between computational scalability and effectiveness. In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and CelebA, as well as a Stable Diffusion model LoRA-finetuned on ArtBench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation. Our work presents a significantly more efficient approach for attributing diffusion models, while the unexpected findings suggest that at least in non-convex settings, constructions guided by theoretical assumptions may lead to inferior attribution performance. The code is available at https://github.com/sail-sg/D-TRAK. 2024-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9271 https://ink.library.smu.edu.sg/context/sis_research/article/10271/viewcontent/2311.00500v2.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Databases and Information Systems |
spellingShingle |
Databases and Information Systems ZHENG, Xiaosen PANG, Tianyu DU, Chao JIANG, Jing LIN, Min Intriguing properties of data attribution on diffusion models |
description |
Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or credited. Several theoretically motivated methods have been proposed to implement data attribution, in an effort to improve the trade-off between computational scalability and effectiveness. In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and CelebA, as well as a Stable Diffusion model LoRA-finetuned on ArtBench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation. Our work presents a significantly more efficient approach for attributing diffusion models, while the unexpected findings suggest that at least in non-convex settings, constructions guided by theoretical assumptions may lead to inferior attribution performance. The code is available at https://github.com/sail-sg/D-TRAK. |
format |
text |
author |
ZHENG, Xiaosen PANG, Tianyu DU, Chao JIANG, Jing LIN, Min |
author_facet |
ZHENG, Xiaosen PANG, Tianyu DU, Chao JIANG, Jing LIN, Min |
author_sort |
ZHENG, Xiaosen |
title |
Intriguing properties of data attribution on diffusion models |
title_short |
Intriguing properties of data attribution on diffusion models |
title_full |
Intriguing properties of data attribution on diffusion models |
title_fullStr |
Intriguing properties of data attribution on diffusion models |
title_full_unstemmed |
Intriguing properties of data attribution on diffusion models |
title_sort |
intriguing properties of data attribution on diffusion models |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2024 |
url |
https://ink.library.smu.edu.sg/sis_research/9271 https://ink.library.smu.edu.sg/context/sis_research/article/10271/viewcontent/2311.00500v2.pdf |
_version_ |
1814047928092721152 |