Generalization capacity of natural language video localization (NLVL) models

Generalization is a critical feature of any machine learning model. Natural Language Video Localization (NLVL) tasks involve processing diverse video content, text queries, and timestamp distributions, making generalization a crucial aspect of model performance. Many NLVL datasets, such as Charades-...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Dhanyamraju, Harsh Rao
مؤلفون آخرون:	Sun Aixin
التنسيق:	Final Year Project
اللغة:	English
منشور في:	Nanyang Technological University 2024
الموضوعات:	Computer and Information Science Temporal sentence grounding in video Natural language video localisation Generalization Multimodal Video moment retrieval Bias Deep learning Artificial intelligence NLVL TSGV VMR Charades-STA-merged Charades-ego-STA Charades-STA Computer vision Natural language processing Information retrieval
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/175072
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-175072
record_format	dspace
spelling	sg-ntu-dr.10356-1750722024-04-19T15:42:08Z Generalization capacity of natural language video localization (NLVL) models Dhanyamraju, Harsh Rao Sun Aixin School of Computer Science and Engineering AXSun@ntu.edu.sg Computer and Information Science Temporal sentence grounding in video Natural language video localisation Generalization Multimodal Video moment retrieval Bias Deep learning Artificial intelligence NLVL TSGV VMR Charades-STA-merged Charades-ego-STA Charades-STA Computer vision Natural language processing Information retrieval Generalization is a critical feature of any machine learning model. Natural Language Video Localization (NLVL) tasks involve processing diverse video content, text queries, and timestamp distributions, making generalization a crucial aspect of model performance. Many NLVL datasets, such as Charades-STA, exhibit distributional biases in both the timestamps associated with actions in videos and the corresponding textual queries. This bias poses a significant obstacle to building robust models with strong generalization capabilities. In this study, we conducted a comprehensive evaluation of NLVL models across various perturbation scenarios to assess its robustness and sensitivities. Leveraging synthetic perturbation sets, including textual, positional, and stylistic alterations, we examined a model’s performance and elucidated its strengths, weaknesses, and underlying mechanisms. Our findings revealed nuanced patterns, highlighting the model's resilience to certain perturbations, such as character swaps, while showcasing heightened sensitivity to others, such as text style variations. Additionally, we explored the implications of dataset curation on model performance, demonstrating the effectiveness of bias mitigation techniques in reducing distributional bias within datasets. Furthermore, we introduced two new datasets, Charades-STAMerged and Charades-Ego STA, aimed at mitigating distributional bias and evaluating NLVL models' generalization on first-person video data. Through these efforts, we offer valuable insights into the performance and interpretability of NLVL models, contributing to the enhancement of model robustness, fairness, and applicability in real world scenarios. Bachelor's degree 2024-04-19T03:51:43Z 2024-04-19T03:51:43Z 2024 Final Year Project (FYP) Dhanyamraju, H. R. (2024). Generalization capacity of natural language video localization (NLVL) models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175072 https://hdl.handle.net/10356/175072 en SCSE23-0662 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Temporal sentence grounding in video Natural language video localisation Generalization Multimodal Video moment retrieval Bias Deep learning Artificial intelligence NLVL TSGV VMR Charades-STA-merged Charades-ego-STA Charades-STA Computer vision Natural language processing Information retrieval
spellingShingle	Computer and Information Science Temporal sentence grounding in video Natural language video localisation Generalization Multimodal Video moment retrieval Bias Deep learning Artificial intelligence NLVL TSGV VMR Charades-STA-merged Charades-ego-STA Charades-STA Computer vision Natural language processing Information retrieval Dhanyamraju, Harsh Rao Generalization capacity of natural language video localization (NLVL) models
description	Generalization is a critical feature of any machine learning model. Natural Language Video Localization (NLVL) tasks involve processing diverse video content, text queries, and timestamp distributions, making generalization a crucial aspect of model performance. Many NLVL datasets, such as Charades-STA, exhibit distributional biases in both the timestamps associated with actions in videos and the corresponding textual queries. This bias poses a significant obstacle to building robust models with strong generalization capabilities. In this study, we conducted a comprehensive evaluation of NLVL models across various perturbation scenarios to assess its robustness and sensitivities. Leveraging synthetic perturbation sets, including textual, positional, and stylistic alterations, we examined a model’s performance and elucidated its strengths, weaknesses, and underlying mechanisms. Our findings revealed nuanced patterns, highlighting the model's resilience to certain perturbations, such as character swaps, while showcasing heightened sensitivity to others, such as text style variations. Additionally, we explored the implications of dataset curation on model performance, demonstrating the effectiveness of bias mitigation techniques in reducing distributional bias within datasets. Furthermore, we introduced two new datasets, Charades-STAMerged and Charades-Ego STA, aimed at mitigating distributional bias and evaluating NLVL models' generalization on first-person video data. Through these efforts, we offer valuable insights into the performance and interpretability of NLVL models, contributing to the enhancement of model robustness, fairness, and applicability in real world scenarios.
author2	Sun Aixin
author_facet	Sun Aixin Dhanyamraju, Harsh Rao
format	Final Year Project
author	Dhanyamraju, Harsh Rao
author_sort	Dhanyamraju, Harsh Rao
title	Generalization capacity of natural language video localization (NLVL) models
title_short	Generalization capacity of natural language video localization (NLVL) models
title_full	Generalization capacity of natural language video localization (NLVL) models
title_fullStr	Generalization capacity of natural language video localization (NLVL) models
title_full_unstemmed	Generalization capacity of natural language video localization (NLVL) models
title_sort	generalization capacity of natural language video localization (nlvl) models
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/175072
_version_	1814047018568384512

Generalization capacity of natural language video localization (NLVL) models

مواد مشابهة