Generalization capacity of natural language video localization (NLVL) models

Generalization is a critical feature of any machine learning model. Natural Language Video Localization (NLVL) tasks involve processing diverse video content, text queries, and timestamp distributions, making generalization a crucial aspect of model performance. Many NLVL datasets, such as Charades-...

Full description

Saved in:
Bibliographic Details
Main Author: Dhanyamraju, Harsh Rao
Other Authors: Sun Aixin
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
VMR
Online Access:https://hdl.handle.net/10356/175072
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-175072
record_format dspace
spelling sg-ntu-dr.10356-1750722024-04-19T15:42:08Z Generalization capacity of natural language video localization (NLVL) models Dhanyamraju, Harsh Rao Sun Aixin School of Computer Science and Engineering AXSun@ntu.edu.sg Computer and Information Science Temporal sentence grounding in video Natural language video localisation Generalization Multimodal Video moment retrieval Bias Deep learning Artificial intelligence NLVL TSGV VMR Charades-STA-merged Charades-ego-STA Charades-STA Computer vision Natural language processing Information retrieval Generalization is a critical feature of any machine learning model. Natural Language Video Localization (NLVL) tasks involve processing diverse video content, text queries, and timestamp distributions, making generalization a crucial aspect of model performance. Many NLVL datasets, such as Charades-STA, exhibit distributional biases in both the timestamps associated with actions in videos and the corresponding textual queries. This bias poses a significant obstacle to building robust models with strong generalization capabilities. In this study, we conducted a comprehensive evaluation of NLVL models across various perturbation scenarios to assess its robustness and sensitivities. Leveraging synthetic perturbation sets, including textual, positional, and stylistic alterations, we examined a model’s performance and elucidated its strengths, weaknesses, and underlying mechanisms. Our findings revealed nuanced patterns, highlighting the model's resilience to certain perturbations, such as character swaps, while showcasing heightened sensitivity to others, such as text style variations. Additionally, we explored the implications of dataset curation on model performance, demonstrating the effectiveness of bias mitigation techniques in reducing distributional bias within datasets. Furthermore, we introduced two new datasets, Charades-STAMerged and Charades-Ego STA, aimed at mitigating distributional bias and evaluating NLVL models' generalization on first-person video data. Through these efforts, we offer valuable insights into the performance and interpretability of NLVL models, contributing to the enhancement of model robustness, fairness, and applicability in real world scenarios. Bachelor's degree 2024-04-19T03:51:43Z 2024-04-19T03:51:43Z 2024 Final Year Project (FYP) Dhanyamraju, H. R. (2024). Generalization capacity of natural language video localization (NLVL) models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175072 https://hdl.handle.net/10356/175072 en SCSE23-0662 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Temporal sentence grounding in video
Natural language video localisation
Generalization
Multimodal
Video moment retrieval
Bias
Deep learning
Artificial intelligence
NLVL
TSGV
VMR
Charades-STA-merged
Charades-ego-STA
Charades-STA
Computer vision
Natural language processing
Information retrieval
spellingShingle Computer and Information Science
Temporal sentence grounding in video
Natural language video localisation
Generalization
Multimodal
Video moment retrieval
Bias
Deep learning
Artificial intelligence
NLVL
TSGV
VMR
Charades-STA-merged
Charades-ego-STA
Charades-STA
Computer vision
Natural language processing
Information retrieval
Dhanyamraju, Harsh Rao
Generalization capacity of natural language video localization (NLVL) models
description Generalization is a critical feature of any machine learning model. Natural Language Video Localization (NLVL) tasks involve processing diverse video content, text queries, and timestamp distributions, making generalization a crucial aspect of model performance. Many NLVL datasets, such as Charades-STA, exhibit distributional biases in both the timestamps associated with actions in videos and the corresponding textual queries. This bias poses a significant obstacle to building robust models with strong generalization capabilities. In this study, we conducted a comprehensive evaluation of NLVL models across various perturbation scenarios to assess its robustness and sensitivities. Leveraging synthetic perturbation sets, including textual, positional, and stylistic alterations, we examined a model’s performance and elucidated its strengths, weaknesses, and underlying mechanisms. Our findings revealed nuanced patterns, highlighting the model's resilience to certain perturbations, such as character swaps, while showcasing heightened sensitivity to others, such as text style variations. Additionally, we explored the implications of dataset curation on model performance, demonstrating the effectiveness of bias mitigation techniques in reducing distributional bias within datasets. Furthermore, we introduced two new datasets, Charades-STAMerged and Charades-Ego STA, aimed at mitigating distributional bias and evaluating NLVL models' generalization on first-person video data. Through these efforts, we offer valuable insights into the performance and interpretability of NLVL models, contributing to the enhancement of model robustness, fairness, and applicability in real world scenarios.
author2 Sun Aixin
author_facet Sun Aixin
Dhanyamraju, Harsh Rao
format Final Year Project
author Dhanyamraju, Harsh Rao
author_sort Dhanyamraju, Harsh Rao
title Generalization capacity of natural language video localization (NLVL) models
title_short Generalization capacity of natural language video localization (NLVL) models
title_full Generalization capacity of natural language video localization (NLVL) models
title_fullStr Generalization capacity of natural language video localization (NLVL) models
title_full_unstemmed Generalization capacity of natural language video localization (NLVL) models
title_sort generalization capacity of natural language video localization (nlvl) models
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/175072
_version_ 1814047018568384512