Enhancing performance in video grounding tasks through the use of attention module

This report investigates improving video grounding tasks through the use of attention mechanisms, tackling the issue of sparse annotations in video datasets. Drawing inspiration from the MMN model \cite{wang2021_negative_2dmap}, we developed a modified model based on the open-source MMN codebase and...

Full description

Saved in:
Bibliographic Details
Main Author: Do Duc Anh
Other Authors: Sun Aixin
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181703
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This report investigates improving video grounding tasks through the use of attention mechanisms, tackling the issue of sparse annotations in video datasets. Drawing inspiration from the MMN model \cite{wang2021_negative_2dmap}, we developed a modified model based on the open-source MMN codebase and evaluated it on several widely-used datasets, including Charades-STA and ActivityNet Captions. Our approach shows improvements over certain benchmarks. Additionally, we conducted an in-depth analysis to assess the role of attention in enhancing the multimodal framework's ability to comprehend the complex structure of videos.