Dataset compression

This study explores dataset distillation and pruning, which are important methods for managing and optimizing datasets for machine learning. The goal is to understand the impact of various dataset distillation methods such as Performance Matching, Gradient Matching, Distribution Matching, Trajectory...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Xiao, Lingao
مؤلفون آخرون: Weichen Liu
التنسيق: Final Year Project
اللغة:English
منشور في: Nanyang Technological University 2024
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/175177
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
الوصف
الملخص:This study explores dataset distillation and pruning, which are important methods for managing and optimizing datasets for machine learning. The goal is to understand the impact of various dataset distillation methods such as Performance Matching, Gradient Matching, Distribution Matching, Trajectory Matching, and BN Matching on creating compact datasets that retain the essence of their larger counterparts. Additionally, dataset pruning or coreset selection techniques such as Forgetting, AUM, Entropy (Uncertainty), EL2N, SSP, and CCS are examined for their ability to refine datasets by removing less informative samples. By combining these methodologies, we hope to gain a nuanced understanding of dataset optimization, which is crucial for improving the efficacy and efficiency of machine learning models. We also conduct experiments on weight perturbation and reduced training steps, as well as explore curriculum learning to further enrich our discourse. This comprehensive treatise on dataset compression can help propel machine-learning models towards higher levels of success.