A thorough benchmark and a new model for light field saliency detection

Compared with current RGB or RGB-D saliency detection datasets, those for light field saliency detection often suffer from many defects, e.g., insufficient data amount and diversity, incomplete data formats, and rough annotations, thus impeding the prosperity of this field. To settle these issues, w...

Full description

Saved in:
Bibliographic Details
Main Authors: Gao, Wei, Fan, Songlin, Li, Ge, Lin, Weisi
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172179
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Compared with current RGB or RGB-D saliency detection datasets, those for light field saliency detection often suffer from many defects, e.g., insufficient data amount and diversity, incomplete data formats, and rough annotations, thus impeding the prosperity of this field. To settle these issues, we elaborately build a large-scale light field dataset, dubbed PKU-LF, comprising 5,000 light fields and covering diverse indoor and outdoor scenes. Our PKU-LF provides all-inclusive representation formats of light fields and offers a unified platform for comparing algorithms utilizing different input formats. For sparking new vitality in saliency detection tasks, we present many unexplored scenarios (such as underwater and high-resolution scenes) and the richest annotations (such as scribble annotations, bounding boxes, object-/instance-level annotations, and edge annotations), on which many potential attention modeling tasks can be investigated. To facilitate the development of saliency detection, we systematically evaluate and analyze 16 representative 2D, 3D, and 4D methods on four existing datasets and the proposed dataset, furnishing a thorough benchmark. Furthermore, tailored to the distinct structural characteristics of light fields, a novel symmetric two-stream architecture (STSA) network is proposed to predict the saliency of light fields more accurately. Specifically, our STSA incorporates a focalness interweavement module (FIM) and three partial decoder modules (PDM). The former is designed to efficiently establish long-range dependencies across focal slices, while the latter aims to effectively aggregate the extracted coadjutant features in a mutual-enhancement way. Extensive experiments demonstrate that our method can significantly outperform the competitors.