PosMLP-Video: Spatial and temporal relative position encoding for efficient video recognition

PosMLP-Video: Spatial and temporal relative position encoding for efficient video recognition

In recent years, vision Transformers and MLPs have demonstrated remarkable performance in image understanding tasks. However, their inherently dense computational operators, such as self-attention and token-mixing layers, pose significant challenges when applied to spatio-temporal video data. To add...

Full description

Saved in:

Bibliographic Details
Main Authors:	HAO, Yanbin, ZHOU, Diansong, WANG, Zhicai, NGO, Chong-wah, HE, Xiangnan, WANG, Meng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Multi-layer perceptron Positional encoding Spatio-temporal modeling Video recognition Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/8256 https://ink.library.smu.edu.sg/context/sis_research/article/9259/viewcontent/PosMLP_preprint_pvoa_cc_by.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Similar Items

Architecture analysis of MLP by geometrical interpretation
by: Xiang, C., et al.
Published: (2014)

Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices
by: WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha, et al.
Published: (2024)

Video partitioning by temporal slice coherency
by: NGO, Chong-wah, et al.
Published: (2001)

Video segmentation: Temporally-constrained graph-based optimization
by: LIU SIYING
Published: (2010)

Video Encoder Optimization for Real - Time Communication
by: TAN YIH HAN
Published: (2011)

SenseCoding: Accelerometer-assisted motion estimation for efficient video encoding
by: Hong, G., et al.
Published: (2013)

Feature selection via sensitivity analysis of MLP probabilistic outputs
by: Yang, J.-B., et al.
Published: (2014)

Relation Understanding in Videos: A Grand Challenge Overview
by: Xindi Shang, et al.
Published: (2020)

A framework for mining topological patterns in spatio-temporal databases
by: Wang, J., et al.
Published: (2013)

Spatio-temporal phrases for activity recognition
by: Zhang Y., et al.
Published: (2018)

Association pattern mining in spatio-temporal databases
by: WANG JUNMEI
Published: (2010)

Geometrical interpretation and architecture selection of MLP
by: Xiang, C., et al.
Published: (2014)

Age-related changes in relational encoding
by: Leow, Dayton Wei Yang
Published: (2013)

A brain-inspired spiking neural network model with temporal encoding and learning
by: Yu, Q., et al.
Published: (2014)

A spatio-temporal autoregressive model for multi-unit residential market analysis
by: Sun, H., et al.
Published: (2013)

SPATIO-TEMPORAL MODELS FOR FORECASTING FOOD DELIVERY DEMAND
by: ANG PENG SENG
Published: (2020)

Decoding-workload-aware video enc
by: Huang, Y., et al.
Published: (2013)

Sensor-assisted video encoding for mobile devices in real-world environments
by: Chen, X., et al.
Published: (2013)

Workload model for video decoding and its applications
by: HUANG YICHENG
Published: (2010)

Temporal consistent video editing using diffusion models
by: Bai, Shun Yao
Published: (2024)

Play and Rewind: Optimizing Binary Representations of Videos by Self-Supervised Temporal Hashing
by: Hanwang Zhang, et al.
Published: (2020)

Adaptive encoding of zoomable video streams based on user access pattern
by: Quang Minh Khiem, N., et al.
Published: (2013)

Weakly supervised video anomaly detection and localization with spatio-temporal prompts
by: WU, Peng, et al.
Published: (2026)

SaVE: Sensor-assisted motion estimation for efficient h.264/AVC video encoding
by: Chen, X., et al.
Published: (2013)

A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
by: Liu, A.-A., et al.
Published: (2021)

Spatio-temporal analysis of the main dengue vector populations in Singapore
by: SUN HAOYANG, et al.
Published: (2021)

Adaptive encoding of zoomable video streams based on user access pattern
by: Quang Minh Khiem, N., et al.
Published: (2013)

Multistage spatio-temporal networks for robust sketch recognition
by: Li, Hanhui, et al.
Published: (2022)

Semi-CNN architecture for effective spatio-temporal learning in action recognition
by: Leong, Mei Chee, et al.
Published: (2021)

Spatio-temporal depth recovery of dynamic scenes with multiple handheld cameras
by: Jiang, H., et al.
Published: (2014)

Learning to segment a video to clips based on scene and camera motion
by: Kowdle A., et al.
Published: (2018)

Parameterized Spatio-Textual Publish/Subscribe in Road Sensor Networks
by: Li, Yanhong, et al.
Published: (2018)

Spatio-temporal feature fusion for real-time prediction of TBM operating parameters: a deep learning approach
by: Fu, Xianlei, et al.
Published: (2022)

Spatial-temporal episodic memory modeling for ADLs: Encoding, retrieval, and prediction
by: SONG, Xinjing, et al.
Published: (2023)

Exploring driving force factors of building energy use and GHG emission using a spatio-temporal regression method
by: Zhang, Yan, et al.
Published: (2023)

Feature selection for MLP neural network: The use of random permutation of probabilistic outputs
by: Yang, J.-B., et al.
Published: (2014)

Integrating spatio-temporal context with multiview representation for object recognition in visual surveillance
by: Liu, X., et al.
Published: (2014)

Motion-based video representation for scene change detection
by: NGO, Chong-wah, et al.
Published: (2002)

Spatio-temporal modeling and analysis of fMRI data using NARX neural network
by: Luo, H., et al.
Published: (2014)

A spatio-temporal filtering approach to denoising of single-trial ERP in rapid image triage
by: Yu, K., et al.
Published: (2014)