Enhancing performance in video grounding tasks through the use of captions
This report explores enhancing video grounding tasks by utilizing generated captions, addressing the challenge posed by sparse annotations in video datasets. We took inspiration from the PCNet model which uses caption-guided attention to fuse the captions generated by Parallel Dynamic Video Captioni...
Saved in:
Main Author: | Liu, Xinran |
---|---|
Other Authors: | Sun Aixin |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175356 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Similar Items
-
Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
by: WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha, et al.
Published: (2024) -
STUDY OF GROUND SETTLEMENT INDUCED BY TUNNELLING AND DATA ENHANCEMENT FOR SETTLEMENT MONITORING USING LIDAR
by: HUANG LAN
Published: (2024) -
A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
by: Liu, A.-A., et al.
Published: (2021) -
Neural image and video captioning
by: Lam, Ting En
Published: (2024) -
Large language model enhanced with prompt-based vanilla distillation for sentence embeddings
by: Wang, Minghao
Published: (2024)