Cross-modal graph with meta concepts for video captioning

Cross-modal graph with meta concepts for video captioning

Video captioning targets interpreting the complex visual contents as text descriptions, which requires the model to fully understand video scenes including objects and their interactions. Prevailing methods adopt off-the-shelf object detection networks to give object proposals and use the attention...

Full description

Saved in:

Bibliographic Details
Main Authors:	WANG, Hao, LIN, Guosheng, HOI, Steven C. H., MIAO, Chunyan
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Semantics Visualization Feature extraction Predictive models Task analysis Computational modeling Location awareness Video captioning vision-and-language Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/7245
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Cross-modal graph with meta concepts for video captioning
by: Wang, Hao, et al.
Published: (2022)

PERSONALIZED VISUAL INFORMATION CAPTIONING
by: WU SHUANG
Published: (2023)

Learning transferable perturbations for image captioning
by: WU, Hanjie, et al.
Published: (2022)

More is better : precise and detailed image captioning using online positive recall and missing concepts mining
by: Zhang, Mingxing, et al.
Published: (2020)

A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
by: Liu, A.-A., et al.
Published: (2021)

Simple or complex? Together for a more accurate just-in-time defect predictor
by: ZHOU, Xin, et al.
Published: (2022)

Learning generalized video memory for automatic video captioning
by: CHANG, Poo-Hee, et al.
Published: (2018)

Semantic-filtered Soft-Split-Aware video captioning with audio-augmented feature
by: Xu, Yuecong, et al.
Published: (2021)

Dynamic captioning: Video accessibility enhancement for hearing impairment
by: Hong, R., et al.
Published: (2013)

Stack-VS : stacked visual-semantic attention for image caption generation
by: Cheng, Ling, et al.
Published: (2021)

Deconfounded image captioning: a causal retrospect
by: Yang, Xu, et al.
Published: (2022)

Automated localization of affective objects and actions in images via caption text-cum-eye gaze analysis
by: Ramanathan, S., et al.
Published: (2013)

Image captioning via semantic element embedding
by: ZHANG, Xiaodan, et al.
Published: (2020)

Context-aware visual policy network for fine-grained image captioning
by: Zha, Zheng-Jun, et al.
Published: (2022)

Learning to collocate Visual-Linguistic Neural Modules for image captioning
by: Yang, Xu, et al.
Published: (2023)

A hybrid approach for detecting prerequisite relations in multi-modal food recipes
by: PAN, Liangming, et al.
Published: (2020)

Keyword-driven image captioning via Context-dependent Bilateral LSTM
by: ZHANG, Xiaodan, et al.
Published: (2017)

AmpSum: adaptive multiple-product summarization towards improving recommendation captions
by: TRUONG, Quoc Tuan, et al.
Published: (2022)

Interactive change-aware transformer network for remote sensing image change captioning
by: Cai, Chen, et al.
Published: (2024)

Modeling check-in behavior with geographical neighborhood influence of venues
by: DOAN, Thanh Nam, et al.
Published: (2017)

A large scale study of long-time contributor prediction for GitHub projects
by: BAO, Lingfeng, et al.
Published: (2021)

Verify feature models using protégé-OWL
by: Wang, H., et al.
Published: (2013)

Verifying feature models using OWL
by: Wang, H.H., et al.
Published: (2013)

Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction
by: HUANG, Qiao, et al.
Published: (2017)

Indexing and retrieval of 3D models aided by active learning
by: Zhang C., et al.
Published: (2018)

TsingNUS: A location-based service system towards Live City
by: Li, G., et al.
Published: (2014)

Towards more precise coincidental correctness detection with Deep Semantic Learning
by: XIE, Huan, et al.
Published: (2024)

Edge detection guide network for semantic segmentation of remote-sensing images
by: Jin, Jianhui, et al.
Published: (2023)

GTube: Geo-predictive video streaming over HTTP in mobile environments
by: Hao, J., et al.
Published: (2014)

Contextual-assisted scratched photo restoration
by: CAI, Weiwei, et al.
Published: (2023)

CgT-GAN: CLIP-guided text GAN for image captioning
by: YU, Jiarui, et al.
Published: (2023)

Multi modal video analysis with LLM for descriptive emotion and expression annotation
by: Fan, Yupei
Published: (2024)

Deep self-supervised representation learning for free-hand sketch
by: Xu, Peng, et al.
Published: (2022)

Disentangled feature representation for few-shot image classification
by: Cheng, Hao, et al.
Published: (2023)

The gap of semantic parsing: A survey on automatic Math word problem solvers
by: ZHANG, Dongxiang, et al.
Published: (2020)

Fusion of AV features and external information sources for event detection in team sports video
by: Xu, H., et al.
Published: (2013)

The fusion of audio-visual features and external knowledge for event detection in team sports video
by: Xu, H., et al.
Published: (2013)

Mask-shadownet: Toward shadow removal via masked adaptive instance normalization
by: HE, Shengfeng, et al.
Published: (2021)

TOWARDS ATTENTION-AWARE CONCEPT MAP BASED REVIEW IN VIDEO LEARNING
by: ZHANG SHAN
Published: (2023)

Feature selection for facial expression recognition using deformation modeling
by: Srivastava, R., et al.
Published: (2013)