Entropy guided attention network for weakly-supervised action localization

One major challenge of Weakly-supervised Temporal Action Localization (WTAL) is to handle diverse backgrounds in videos. To model background frames, most existing methods treat them as an additional action class. However, because background frames usually do not share common semantics, squeezing all...

Full description

Saved in:
Bibliographic Details
Main Authors: Cheng, Yi, Sun, Ying, Fan, Hehe, Zhuo, Tao, Lim, Joo-Hwee, Kankanhalli, Mohan
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/164107
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-164107
record_format dspace
spelling sg-ntu-dr.10356-1641072023-01-05T01:14:33Z Entropy guided attention network for weakly-supervised action localization Cheng, Yi Sun, Ying Fan, Hehe Zhuo, Tao Lim, Joo-Hwee Kankanhalli, Mohan School of Computer Science and Engineering Institute for Infocomm Research, A*STAR Centre for Frontier AI Research, A*STAR Engineering::Computer science and engineering Temporal Action Localization Weakly-Supervised Learning One major challenge of Weakly-supervised Temporal Action Localization (WTAL) is to handle diverse backgrounds in videos. To model background frames, most existing methods treat them as an additional action class. However, because background frames usually do not share common semantics, squeezing all the different background frames into a single class hinders network optimization. Moreover, the network would be confused and tends to fail when tested on videos with unseen background frames. To address this problem, we propose an Entropy Guided Attention Network (EGA-Net) to treat background frames as out-of-domain samples. Specifically, we design a two-branch module, where a domain branch detects whether a frame is an action by learning a class-agnostic attention map, and an action branch recognizes the action category of the frame by learning a class-specific attention map. By aggregating the two attention maps to model the joint domain-class distribution of frames, our EGA-Net can handle varying backgrounds. To train the class-agnostic attention map with only the video-level class labels, we propose an Entropy Guided Loss (EGL), which employs entropy as the supervision signal to distinguish action and background. Moreover, we propose a Global Similarity Loss (GSL) to enhance the action-specific attention map via action class center. Extensive experiments on THUMOS14, ActivityNet1.2 and ActivityNet1.3 datasets demonstrate the effectiveness of our EGA-Net. Agency for Science, Technology and Research (A*STAR) This research is supported by the Agency for Science, Technology and Research, under the AME Programmatic Funding Scheme A18A2b0046 and the National Natural Science Foundation of China under Grant 62002188. 2023-01-05T01:14:33Z 2023-01-05T01:14:33Z 2022 Journal Article Cheng, Y., Sun, Y., Fan, H., Zhuo, T., Lim, J. & Kankanhalli, M. (2022). Entropy guided attention network for weakly-supervised action localization. Pattern Recognition, 129, 108718-. https://dx.doi.org/10.1016/j.patcog.2022.108718 0031-3203 https://hdl.handle.net/10356/164107 10.1016/j.patcog.2022.108718 2-s2.0-85129325631 129 108718 en A18A2b0046 Pattern Recognition © 2022 Elsevier Ltd. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Temporal Action Localization
Weakly-Supervised Learning
spellingShingle Engineering::Computer science and engineering
Temporal Action Localization
Weakly-Supervised Learning
Cheng, Yi
Sun, Ying
Fan, Hehe
Zhuo, Tao
Lim, Joo-Hwee
Kankanhalli, Mohan
Entropy guided attention network for weakly-supervised action localization
description One major challenge of Weakly-supervised Temporal Action Localization (WTAL) is to handle diverse backgrounds in videos. To model background frames, most existing methods treat them as an additional action class. However, because background frames usually do not share common semantics, squeezing all the different background frames into a single class hinders network optimization. Moreover, the network would be confused and tends to fail when tested on videos with unseen background frames. To address this problem, we propose an Entropy Guided Attention Network (EGA-Net) to treat background frames as out-of-domain samples. Specifically, we design a two-branch module, where a domain branch detects whether a frame is an action by learning a class-agnostic attention map, and an action branch recognizes the action category of the frame by learning a class-specific attention map. By aggregating the two attention maps to model the joint domain-class distribution of frames, our EGA-Net can handle varying backgrounds. To train the class-agnostic attention map with only the video-level class labels, we propose an Entropy Guided Loss (EGL), which employs entropy as the supervision signal to distinguish action and background. Moreover, we propose a Global Similarity Loss (GSL) to enhance the action-specific attention map via action class center. Extensive experiments on THUMOS14, ActivityNet1.2 and ActivityNet1.3 datasets demonstrate the effectiveness of our EGA-Net.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Cheng, Yi
Sun, Ying
Fan, Hehe
Zhuo, Tao
Lim, Joo-Hwee
Kankanhalli, Mohan
format Article
author Cheng, Yi
Sun, Ying
Fan, Hehe
Zhuo, Tao
Lim, Joo-Hwee
Kankanhalli, Mohan
author_sort Cheng, Yi
title Entropy guided attention network for weakly-supervised action localization
title_short Entropy guided attention network for weakly-supervised action localization
title_full Entropy guided attention network for weakly-supervised action localization
title_fullStr Entropy guided attention network for weakly-supervised action localization
title_full_unstemmed Entropy guided attention network for weakly-supervised action localization
title_sort entropy guided attention network for weakly-supervised action localization
publishDate 2023
url https://hdl.handle.net/10356/164107
_version_ 1754611265792114688