Dynamic temporal filtering in video models

Dynamic temporal filtering in video models

Video temporal dynamics is conventionally modeled with 3D spatial-temporal kernel or its factorized version comprised of 2D spatial kernel and 1D temporal kernel. The modeling power, nevertheless, is limited by the fixed window size and static weights of a kernel along the temporal dimension. The pr...

Full description

Saved in:

Bibliographic Details
Main Authors:	LONG, Fuchen, QIU, Zhaofan, PAN, Yingwei, YAO, Ting, NGO, Chong-wah, MEI, Tao
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/7509 https://ink.library.smu.edu.sg/context/sis_research/article/8512/viewcontent/136950470.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

MLP-3D: A MLP-like 3D architecture with grouped time mixing
by: QIU, Zhaofan, et al.
Published: (2022)

Learning temporal dynamics in videos with image transformer
by: SHU, Yan, et al.
Published: (2024)

Wave-ViT: Unifying wavelet and transformers for visual representation learning
by: YAO, Ting, et al.
Published: (2022)

Zero-shot ingredient recognition by multi-relational graph convolutional network
by: CHEN, Jingjing, et al.
Published: (2020)

Feature prediction diffusion model for video anomaly detection
by: YAN, Cheng, et al.
Published: (2023)

Self-trained deep ordinal regression for end-to-end video anomaly detection
by: PANG, Guansong, et al.
Published: (2020)

Learning spatio-temporal representation with local and global diffusion
by: QIU, Zhaofan, et al.
Published: (2019)

Condensing a sequence to one informative frame for video recognition
by: QIU. Zhaofan,, et al.
Published: (2021)

PosMLP-Video: Spatial and temporal relative position encoding for efficient video recognition
by: HAO, Yanbin, et al.
Published: (2024)

Serendipity-driven celebrity video hyperlinking
by: YANG, Shujun, et al.
Published: (2016)

Outlier-robust tensor PCA
by: ZHOU, Pan, et al.
Published: (2016)

Rushes video summarization by object and event understanding
by: WANG, Feng, et al.
Published: (2007)

Adversarial meta sampling for multilingual low-resource speech recognition
by: XIAO, Yubei, et al.
Published: (2021)

How important is the train-validation split in meta-learning?
by: BAI, Yu, et al.
Published: (2021)

Long-term leap attention, short-term periodic shift for video classification
by: ZHANG, Hao, et al.
Published: (2022)

Video event detection using motion relativity and visual relatedness
by: WANG, Feng, et al.
Published: (2008)

CONQUER: Contextual query-aware ranking for video corpus moment retrieval
by: HOU, Zhijian, et al.
Published: (2021)

Towards improving system performance in large scale multi-agent systems with selfish agents
by: KUMAR, Rajiv Ranjan
Published: (2022)

Visual Commonsense R-CNN
by: WANG, Tan, et al.
Published: (2020)

Knowledge-aware multimodal fashion chatbot
by: LIAO, Lizi, et al.
Published: (2018)

Debiasing NLU models via causal intervention and counterfactual reasoning
by: TIAN, Bing, et al.
Published: (2022)

Gesture enhanced comprehension of ambiguous human-to-robot instructions
by: WEERAKOON MUDIYANSELAGE DULANGA KAVEESHA WEERAKOON,, et al.
Published: (2020)

Self-supervised multi-class pre-training for unsupervised anomaly detection and segmentation in medical images
by: TIAN, Yu, et al.
Published: (2021)

Edgeduet: Tiling small object detection for edge assisted autonomous mobile vision
by: WANG, Xu, et al.
Published: (2021)

Global context aware convolutions for 3D point cloud understanding
by: ZHANG, Zhiyuan, et al.
Published: (2020)

Symmetry robust descriptor for non-rigid surface matching
by: ZHANG, Zhiyuan, et al.
Published: (2013)

Pixel-wise energy-biased abstention learning for anomaly segmentation on complex urban driving scenes
by: TIAN, Yu, et al.
Published: (2022)

Reducing adaptation latency for multi-concept visual perception in outdoor environments
by: WIGNESS, Maggie, et al.
Published: (2016)

GDFace: Gated deformation for multi-view face image synthesis
by: XU, Xuemiao, et al.
Published: (2020)

Test-time augmentation for 3D point cloud classification and segmentation
by: VU, Tuan-Anh, et al.
Published: (2024)

ImageInThat: Manipulating images to convey user instructions to robots
by: MAHADEVAN, Karthik, et al.
Published: (2025)

Synthesizing multi-person and rare pose images for human pose estimation
by: ZHAO, Liuqing, et al.
Published: (2025)

Towards textually describing complex video contents with audio-visual concept classifiers
by: TAN, Chun Chet, et al.
Published: (2011)

Interactive video corpus moment retrieval using reinforcement learning
by: MA, Zhixin, et al.
Published: (2022)

Reinforcement learning-based interactive video search
by: MA, Zhixin, et al.
Published: (2022)

Exploring category-agnostic clusters for open-set domain adaptation
by: PAN, Yingwei, et al.
Published: (2020)

Learning to hallucinate face images via component generation and enhancement
by: SONG, Yibing, et al.
Published: (2017)

VireoJD-MM @ TRECVID 2019: Activities in extended video (ACTEV)
by: HOU, Zhijian, et al.
Published: (2019)

DualFormer: Local-global stratified transformer for efficient video recognition
by: LIANG, Yuxuan, et al.
Published: (2022)

Semi-supervised domain adaptation with subspace learning for visual recognition
by: YAO, Ting, et al.
Published: (2015)