Intriguing properties of data attribution on diffusion models

Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or...

Full description

Saved in:

Bibliographic Details
Main Authors:	ZHENG, Xiaosen, PANG, Tianyu, DU, Chao, JIANG, Jing, LIN, Min
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/9271 https://ink.library.smu.edu.sg/context/sis_research/article/10271/viewcontent/2311.00500v2.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10271
record_format	dspace
spelling	sg-smu-ink.sis_research-102712024-10-17T07:30:24Z Intriguing properties of data attribution on diffusion models ZHENG, Xiaosen PANG, Tianyu DU, Chao JIANG, Jing LIN, Min Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or credited. Several theoretically motivated methods have been proposed to implement data attribution, in an effort to improve the trade-off between computational scalability and effectiveness. In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and CelebA, as well as a Stable Diffusion model LoRA-finetuned on ArtBench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation. Our work presents a significantly more efficient approach for attributing diffusion models, while the unexpected findings suggest that at least in non-convex settings, constructions guided by theoretical assumptions may lead to inferior attribution performance. The code is available at https://github.com/sail-sg/D-TRAK. 2024-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9271 https://ink.library.smu.edu.sg/context/sis_research/article/10271/viewcontent/2311.00500v2.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Databases and Information Systems
spellingShingle	Databases and Information Systems ZHENG, Xiaosen PANG, Tianyu DU, Chao JIANG, Jing LIN, Min Intriguing properties of data attribution on diffusion models
description	Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or credited. Several theoretically motivated methods have been proposed to implement data attribution, in an effort to improve the trade-off between computational scalability and effectiveness. In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and CelebA, as well as a Stable Diffusion model LoRA-finetuned on ArtBench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation. Our work presents a significantly more efficient approach for attributing diffusion models, while the unexpected findings suggest that at least in non-convex settings, constructions guided by theoretical assumptions may lead to inferior attribution performance. The code is available at https://github.com/sail-sg/D-TRAK.
format	text
author	ZHENG, Xiaosen PANG, Tianyu DU, Chao JIANG, Jing LIN, Min
author_facet	ZHENG, Xiaosen PANG, Tianyu DU, Chao JIANG, Jing LIN, Min
author_sort	ZHENG, Xiaosen
title	Intriguing properties of data attribution on diffusion models
title_short	Intriguing properties of data attribution on diffusion models
title_full	Intriguing properties of data attribution on diffusion models
title_fullStr	Intriguing properties of data attribution on diffusion models
title_full_unstemmed	Intriguing properties of data attribution on diffusion models
title_sort	intriguing properties of data attribution on diffusion models
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9271 https://ink.library.smu.edu.sg/context/sis_research/article/10271/viewcontent/2311.00500v2.pdf
_version_	1814047928092721152

Intriguing properties of data attribution on diffusion models

Similar Items