Cross-modal graph with meta concepts for video captioning

Cross-modal graph with meta concepts for video captioning

Video captioning targets interpreting the complex visual contents as text descriptions, which requires the model to fully understand video scenes including objects and their interactions. Prevailing methods adopt off-the-shelf object detection networks to give object proposals and use the attention...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wang, Hao, Lin, Guosheng, Hoi, Steven C. H., Miao, Chunyan
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2022
Subjects:	Engineering::Computer science and engineering Video Captioning Vision-and-Language
Online Access:	https://hdl.handle.net/10356/162546
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Similar Items

Cross-modal graph with meta concepts for video captioning
by: WANG, Hao, et al.
Published: (2022)

PERSONALIZED VISUAL INFORMATION CAPTIONING
by: WU SHUANG
Published: (2023)

A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
by: Liu, A.-A., et al.
Published: (2021)

Semantic-filtered Soft-Split-Aware video captioning with audio-augmented feature
by: Xu, Yuecong, et al.
Published: (2021)

Dynamic captioning: Video accessibility enhancement for hearing impairment
by: Hong, R., et al.
Published: (2013)

Learning generalized video memory for automatic video captioning
by: CHANG, Poo-Hee, et al.
Published: (2018)

Stack-VS : stacked visual-semantic attention for image caption generation
by: Cheng, Ling, et al.
Published: (2021)

Deconfounded image captioning: a causal retrospect
by: Yang, Xu, et al.
Published: (2022)

More is better : precise and detailed image captioning using online positive recall and missing concepts mining
by: Zhang, Mingxing, et al.
Published: (2020)

Context-aware visual policy network for fine-grained image captioning
by: Zha, Zheng-Jun, et al.
Published: (2022)

Learning to collocate Visual-Linguistic Neural Modules for image captioning
by: Yang, Xu, et al.
Published: (2023)

A Qualitative Study of Closed Captions in English Language Teaching (ELT) YouTube Videos
by: Hernandez, Queenie Mae G., et al.
Published: (2024)

Interactive change-aware transformer network for remote sensing image change captioning
by: Cai, Chen, et al.
Published: (2024)

Image captioning via semantic element embedding
by: ZHANG, Xiaodan, et al.
Published: (2020)

CgT-GAN: CLIP-guided text GAN for image captioning
by: YU, Jiarui, et al.
Published: (2023)

Learning transferable perturbations for image captioning
by: WU, Hanjie, et al.
Published: (2022)

Keyword-driven image captioning via Context-dependent Bilateral LSTM
by: ZHANG, Xiaodan, et al.
Published: (2017)

AmpSum: adaptive multiple-product summarization towards improving recommendation captions
by: TRUONG, Quoc Tuan, et al.
Published: (2022)

Decomposing generation networks with structure prediction for recipe generation
by: Wang, Hao, et al.
Published: (2022)

Learning structural representations for recipe generation and food retrieval
by: Wang, Hao, et al.
Published: (2022)

Efficient cross-modal video retrieval with meta-optimized frames
by: HAN, Ning, et al.
Published: (2024)

Who You Are Decides How You Tell
by: WU SHUANG, et al.
Published: (2020)

Video accessibility enhancement for hearing-impaired users
by: Hong, R., et al.
Published: (2013)

Cross-modal Moment Localization in Videos
by: Meng Liu, et al.
Published: (2020)

Paired cross-modal data augmentation for fine-grained image-to-text retrieval
by: Wang, Hao, et al.
Published: (2023)

Rights that can’t be heard: Addressing the need for extending closed caption law in social media platforms for COVID-19 pandemic related coverage
by: Daos, Bernadette De Vera
Published: (2020)

Audio captioning and retrieval with improved cross-modal objectives
by: Koh, Andrew Jin Jie
Published: (2023)

Decomposing generation networks with structure prediction for recipe generation
by: WANG, Hao, et al.
Published: (2022)

Weakly supervised segmentation with maximum bipartite graph matching
by: Liu, Weide, et al.
Published: (2021)

Development of obstruction detection system using computer vision for generation of data for video analytics
by: Mendoza, Dion Michael M.
Published: (2018)

Automated localization of affective objects and actions in images via caption text-cum-eye gaze analysis
by: Ramanathan, S., et al.
Published: (2013)

Aligning vision and language for image captioning using deep learning
by: Cai, Chen
Published: (2024)

Improving interpretable embeddings for ad-hoc video search with generative captions and multi-word concept bank
by: WU, Jiaxin, et al.
Published: (2024)

Contrastive video question answering via video graph transformer
by: XIAO, Junbin Xiao, et al.
Published: (2023)

R2GAN: Cross-modal recipe retrieval with generative adversarial network
by: ZHU, Bin, et al.
Published: (2019)

Content-aware vision-based vehicle tracking for frame-skipped videos
by: Cempron, Jonathan Paul C.
Published: (2021)

Neural image and video captioning (NIVC)
by: Lee, Jeremy Kian Kiat
Published: (2022)

Tree-augmented cross-modal encoding for complex-query video retrieval
by: YANG, Xun, et al.
Published: (2020)

VRAG: Region Attention Graph for Content-Based Video Retrieval
by: KENNARD NG POOL HUA
Published: (2021)

Dynamic fusion with intra-and inter-modality attention flow for visual question answering
by: GAO, Peng, et al.
Published: (2019)