Generalization capacity of natural language video localization (NLVL) models
Generalization is a critical feature of any machine learning model. Natural Language Video Localization (NLVL) tasks involve processing diverse video content, text queries, and timestamp distributions, making generalization a crucial aspect of model performance. Many NLVL datasets, such as Charades-...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175072 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-175072 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1750722024-04-19T15:42:08Z Generalization capacity of natural language video localization (NLVL) models Dhanyamraju, Harsh Rao Sun Aixin School of Computer Science and Engineering AXSun@ntu.edu.sg Computer and Information Science Temporal sentence grounding in video Natural language video localisation Generalization Multimodal Video moment retrieval Bias Deep learning Artificial intelligence NLVL TSGV VMR Charades-STA-merged Charades-ego-STA Charades-STA Computer vision Natural language processing Information retrieval Generalization is a critical feature of any machine learning model. Natural Language Video Localization (NLVL) tasks involve processing diverse video content, text queries, and timestamp distributions, making generalization a crucial aspect of model performance. Many NLVL datasets, such as Charades-STA, exhibit distributional biases in both the timestamps associated with actions in videos and the corresponding textual queries. This bias poses a significant obstacle to building robust models with strong generalization capabilities. In this study, we conducted a comprehensive evaluation of NLVL models across various perturbation scenarios to assess its robustness and sensitivities. Leveraging synthetic perturbation sets, including textual, positional, and stylistic alterations, we examined a model’s performance and elucidated its strengths, weaknesses, and underlying mechanisms. Our findings revealed nuanced patterns, highlighting the model's resilience to certain perturbations, such as character swaps, while showcasing heightened sensitivity to others, such as text style variations. Additionally, we explored the implications of dataset curation on model performance, demonstrating the effectiveness of bias mitigation techniques in reducing distributional bias within datasets. Furthermore, we introduced two new datasets, Charades-STAMerged and Charades-Ego STA, aimed at mitigating distributional bias and evaluating NLVL models' generalization on first-person video data. Through these efforts, we offer valuable insights into the performance and interpretability of NLVL models, contributing to the enhancement of model robustness, fairness, and applicability in real world scenarios. Bachelor's degree 2024-04-19T03:51:43Z 2024-04-19T03:51:43Z 2024 Final Year Project (FYP) Dhanyamraju, H. R. (2024). Generalization capacity of natural language video localization (NLVL) models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175072 https://hdl.handle.net/10356/175072 en SCSE23-0662 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Temporal sentence grounding in video Natural language video localisation Generalization Multimodal Video moment retrieval Bias Deep learning Artificial intelligence NLVL TSGV VMR Charades-STA-merged Charades-ego-STA Charades-STA Computer vision Natural language processing Information retrieval |
spellingShingle |
Computer and Information Science Temporal sentence grounding in video Natural language video localisation Generalization Multimodal Video moment retrieval Bias Deep learning Artificial intelligence NLVL TSGV VMR Charades-STA-merged Charades-ego-STA Charades-STA Computer vision Natural language processing Information retrieval Dhanyamraju, Harsh Rao Generalization capacity of natural language video localization (NLVL) models |
description |
Generalization is a critical feature of any machine learning model. Natural Language Video Localization (NLVL) tasks involve processing diverse video content, text queries, and timestamp distributions, making generalization a crucial aspect of model performance. Many NLVL datasets, such as Charades-STA, exhibit distributional biases in both the timestamps associated with actions in videos and the corresponding textual queries. This bias poses a significant obstacle to building robust models with strong generalization capabilities. In this study, we conducted a comprehensive evaluation of NLVL models across various perturbation scenarios to assess its robustness and sensitivities. Leveraging synthetic perturbation sets, including textual, positional, and stylistic alterations, we examined a model’s performance and elucidated its strengths, weaknesses, and underlying mechanisms. Our findings revealed nuanced patterns, highlighting the model's resilience to certain perturbations, such as character swaps, while showcasing heightened sensitivity to others, such as text style variations. Additionally, we explored the implications of dataset curation on model performance, demonstrating the effectiveness of bias mitigation techniques in reducing distributional bias within datasets. Furthermore, we introduced two new datasets, Charades-STAMerged and Charades-Ego STA, aimed at mitigating distributional bias and evaluating NLVL models' generalization on first-person video data. Through these efforts, we offer valuable insights into the performance and interpretability of NLVL models, contributing to the enhancement of model robustness, fairness, and applicability in real world scenarios. |
author2 |
Sun Aixin |
author_facet |
Sun Aixin Dhanyamraju, Harsh Rao |
format |
Final Year Project |
author |
Dhanyamraju, Harsh Rao |
author_sort |
Dhanyamraju, Harsh Rao |
title |
Generalization capacity of natural language video localization (NLVL) models |
title_short |
Generalization capacity of natural language video localization (NLVL) models |
title_full |
Generalization capacity of natural language video localization (NLVL) models |
title_fullStr |
Generalization capacity of natural language video localization (NLVL) models |
title_full_unstemmed |
Generalization capacity of natural language video localization (NLVL) models |
title_sort |
generalization capacity of natural language video localization (nlvl) models |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/175072 |
_version_ |
1814047018568384512 |