Spatial context-aware object-attentional network for multi-label image classification

Multi-label image classification is a fundamental but challenging task in computer vision. To tackle the problem, the label-related semantic information is often exploited, but the background context and spatial semantic information of related objects are not fully utilized. To address these issues,...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhang, Jialu, Ren, Jianfeng, Zhang, Qian, Liu, Jiang, Jiang, Xudong
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/174558
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Multi-label image classification is a fundamental but challenging task in computer vision. To tackle the problem, the label-related semantic information is often exploited, but the background context and spatial semantic information of related objects are not fully utilized. To address these issues, a multi-branch deep neural network is proposed in this paper. The first branch is designed to extract the discriminant information from regions of interest to detect target objects. In the second branch, a spatial context-aware approach is proposed to better capture the contextual information of an object in its surroundings by using an adaptive patch expansion mechanism. It helps the detection of small objects that are easily lost without the support of context information. The third one, the object-attentional branch, exploits the spatial semantic relations between the target object and its related objects, to better detect partially occluded, small or dim objects with the support of those easily detectable objects. To better encode such relations, an attention mechanism jointly considering the spatial and semantic relations between objects is developed. Two widely used benchmark datasets for multi-labeling classification, MS COCO and PASCAL VOC, are used to evaluate the proposed framework. The experimental results demonstrate that the proposed method outperforms the state-of-the-art methods for multi-label image classification.