Gaze prediction based on long short-term memory convolution with associated features of video frames
Gaze prediction is a key issue for visual perception research. It can be used to infer important regions in videos to reduce the amount of computation in learning and inference of various analysis tasks. Vanilla methods for dynamic video unable to extract valid features, and the motion information a...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172061 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-172061 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1720612023-11-21T02:05:20Z Gaze prediction based on long short-term memory convolution with associated features of video frames Xiao, Limei Zhu, Zizhong Liu, Hao Li, Ce Fu, Wenhao School of Computer Science and Engineering Engineering::Computer science and engineering Gaze Prediction Dynamic Video Gaze prediction is a key issue for visual perception research. It can be used to infer important regions in videos to reduce the amount of computation in learning and inference of various analysis tasks. Vanilla methods for dynamic video unable to extract valid features, and the motion information among dynamic video frames are ignored, which lead to poor prediction results. We propose a gaze prediction based on LSTM convolution with associated features of video frames (LSTM-CVFAF). Firstly, by adding learnable central prior knowledge, the proposed method can effectively and accurately extract the spatial information of each frame. Secondly, the LSTM is deployed to get temporal motion gaze features. Finally, the spatial and temporal motion information is fused to generate the gaze prediction maps of the dynamic video. Compared with the state-of-art models on DHF1K dataset, the CC, AUC-j, sAUC, NSS are separately increased by 5.1%, 0.6%, 38.2% and 0.5%. This article was supported in part by the National Natural Science Foundation (NSFC) of China under Grant No. 61866022. 2023-11-21T02:05:19Z 2023-11-21T02:05:19Z 2023 Journal Article Xiao, L., Zhu, Z., Liu, H., Li, C. & Fu, W. (2023). Gaze prediction based on long short-term memory convolution with associated features of video frames. Computers and Electrical Engineering, 107, 108625-. https://dx.doi.org/10.1016/j.compeleceng.2023.108625 0045-7906 https://hdl.handle.net/10356/172061 10.1016/j.compeleceng.2023.108625 2-s2.0-85149057910 107 108625 en Computers and Electrical Engineering © 2023 Elsevier Ltd. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Gaze Prediction Dynamic Video |
spellingShingle |
Engineering::Computer science and engineering Gaze Prediction Dynamic Video Xiao, Limei Zhu, Zizhong Liu, Hao Li, Ce Fu, Wenhao Gaze prediction based on long short-term memory convolution with associated features of video frames |
description |
Gaze prediction is a key issue for visual perception research. It can be used to infer important regions in videos to reduce the amount of computation in learning and inference of various analysis tasks. Vanilla methods for dynamic video unable to extract valid features, and the motion information among dynamic video frames are ignored, which lead to poor prediction results. We propose a gaze prediction based on LSTM convolution with associated features of video frames (LSTM-CVFAF). Firstly, by adding learnable central prior knowledge, the proposed method can effectively and accurately extract the spatial information of each frame. Secondly, the LSTM is deployed to get temporal motion gaze features. Finally, the spatial and temporal motion information is fused to generate the gaze prediction maps of the dynamic video. Compared with the state-of-art models on DHF1K dataset, the CC, AUC-j, sAUC, NSS are separately increased by 5.1%, 0.6%, 38.2% and 0.5%. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Xiao, Limei Zhu, Zizhong Liu, Hao Li, Ce Fu, Wenhao |
format |
Article |
author |
Xiao, Limei Zhu, Zizhong Liu, Hao Li, Ce Fu, Wenhao |
author_sort |
Xiao, Limei |
title |
Gaze prediction based on long short-term memory convolution with associated features of video frames |
title_short |
Gaze prediction based on long short-term memory convolution with associated features of video frames |
title_full |
Gaze prediction based on long short-term memory convolution with associated features of video frames |
title_fullStr |
Gaze prediction based on long short-term memory convolution with associated features of video frames |
title_full_unstemmed |
Gaze prediction based on long short-term memory convolution with associated features of video frames |
title_sort |
gaze prediction based on long short-term memory convolution with associated features of video frames |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/172061 |
_version_ |
1783955575763107840 |