Generalization capacity of natural language video localization (NLVL) models

Generalization is a critical feature of any machine learning model. Natural Language Video Localization (NLVL) tasks involve processing diverse video content, text queries, and timestamp distributions, making generalization a crucial aspect of model performance. Many NLVL datasets, such as Charades-...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Dhanyamraju, Harsh Rao
مؤلفون آخرون:	Sun Aixin
التنسيق:	Final Year Project
اللغة:	English
منشور في:	Nanyang Technological University 2024
الموضوعات:	Computer and Information Science Temporal sentence grounding in video Natural language video localisation Generalization Multimodal Video moment retrieval Bias Deep learning Artificial intelligence NLVL TSGV VMR Charades-STA-merged Charades-ego-STA Charades-STA Computer vision Natural language processing Information retrieval
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/175072
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

الوصف
الملخص:	Generalization is a critical feature of any machine learning model. Natural Language Video Localization (NLVL) tasks involve processing diverse video content, text queries, and timestamp distributions, making generalization a crucial aspect of model performance. Many NLVL datasets, such as Charades-STA, exhibit distributional biases in both the timestamps associated with actions in videos and the corresponding textual queries. This bias poses a significant obstacle to building robust models with strong generalization capabilities. In this study, we conducted a comprehensive evaluation of NLVL models across various perturbation scenarios to assess its robustness and sensitivities. Leveraging synthetic perturbation sets, including textual, positional, and stylistic alterations, we examined a model’s performance and elucidated its strengths, weaknesses, and underlying mechanisms. Our findings revealed nuanced patterns, highlighting the model's resilience to certain perturbations, such as character swaps, while showcasing heightened sensitivity to others, such as text style variations. Additionally, we explored the implications of dataset curation on model performance, demonstrating the effectiveness of bias mitigation techniques in reducing distributional bias within datasets. Furthermore, we introduced two new datasets, Charades-STAMerged and Charades-Ego STA, aimed at mitigating distributional bias and evaluating NLVL models' generalization on first-person video data. Through these efforts, we offer valuable insights into the performance and interpretability of NLVL models, contributing to the enhancement of model robustness, fairness, and applicability in real world scenarios.

Generalization capacity of natural language video localization (NLVL) models

مواد مشابهة