Dataset compression

This study explores dataset distillation and pruning, which are important methods for managing and optimizing datasets for machine learning. The goal is to understand the impact of various dataset distillation methods such as Performance Matching, Gradient Matching, Distribution Matching, Trajectory...

Full description

Saved in:

Bibliographic Details
Main Author:	Xiao, Lingao
Other Authors:	Weichen Liu
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Efficiency
Online Access:	https://hdl.handle.net/10356/175177
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-175177
record_format	dspace
spelling	sg-ntu-dr.10356-1751772024-04-19T15:43:12Z Dataset compression Xiao, Lingao Weichen Liu School of Computer Science and Engineering liu@ntu.edu.sg Computer and Information Science Efficiency This study explores dataset distillation and pruning, which are important methods for managing and optimizing datasets for machine learning. The goal is to understand the impact of various dataset distillation methods such as Performance Matching, Gradient Matching, Distribution Matching, Trajectory Matching, and BN Matching on creating compact datasets that retain the essence of their larger counterparts. Additionally, dataset pruning or coreset selection techniques such as Forgetting, AUM, Entropy (Uncertainty), EL2N, SSP, and CCS are examined for their ability to refine datasets by removing less informative samples. By combining these methodologies, we hope to gain a nuanced understanding of dataset optimization, which is crucial for improving the efficacy and efficiency of machine learning models. We also conduct experiments on weight perturbation and reduced training steps, as well as explore curriculum learning to further enrich our discourse. This comprehensive treatise on dataset compression can help propel machine-learning models towards higher levels of success. Bachelor's degree 2024-04-19T12:04:44Z 2024-04-19T12:04:44Z 2024 Final Year Project (FYP) Xiao, L. (2024). Dataset compression. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175177 https://hdl.handle.net/10356/175177 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Efficiency
spellingShingle	Computer and Information Science Efficiency Xiao, Lingao Dataset compression
description	This study explores dataset distillation and pruning, which are important methods for managing and optimizing datasets for machine learning. The goal is to understand the impact of various dataset distillation methods such as Performance Matching, Gradient Matching, Distribution Matching, Trajectory Matching, and BN Matching on creating compact datasets that retain the essence of their larger counterparts. Additionally, dataset pruning or coreset selection techniques such as Forgetting, AUM, Entropy (Uncertainty), EL2N, SSP, and CCS are examined for their ability to refine datasets by removing less informative samples. By combining these methodologies, we hope to gain a nuanced understanding of dataset optimization, which is crucial for improving the efficacy and efficiency of machine learning models. We also conduct experiments on weight perturbation and reduced training steps, as well as explore curriculum learning to further enrich our discourse. This comprehensive treatise on dataset compression can help propel machine-learning models towards higher levels of success.
author2	Weichen Liu
author_facet	Weichen Liu Xiao, Lingao
format	Final Year Project
author	Xiao, Lingao
author_sort	Xiao, Lingao
title	Dataset compression
title_short	Dataset compression
title_full	Dataset compression
title_fullStr	Dataset compression
title_full_unstemmed	Dataset compression
title_sort	dataset compression
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/175177
_version_	1806059818608230400

Dataset compression

Similar Items