ObjectFusion: Multi-modal 3D object detection with object-centric fusion

Recent progress on multi-modal 3D object detection has featured BEV (Bird-Eye-View) based fusion, which effectively unifies both LiDAR point clouds and camera images in a shared BEV space. Nevertheless, it is not trivial to perform camera-to-BEV transformation due to the inherently ambiguous depth e...

Full description

Saved in:

Bibliographic Details
Main Authors:	CAI, Q., PAN, Y., YAO, T., NGO, Chong-wah, MEI, T.
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	3D object detection Multi-modal Fusion-based approach Artificial Intelligence and Robotics Robotics
Online Access:	https://ink.library.smu.edu.sg/sis_research/8306 https://ink.library.smu.edu.sg/context/sis_research/article/9309/viewcontent/Cai_ObjectFusion_Multi_modal_3D_Object_Detection_with_Object_Centric_Fusion_ICCV_2023_paper.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9309
record_format	dspace
spelling	sg-smu-ink.sis_research-93092023-12-05T03:19:20Z ObjectFusion: Multi-modal 3D object detection with object-centric fusion CAI, Q. PAN, Y. YAO, T. NGO, Chong-wah MEI, T. Recent progress on multi-modal 3D object detection has featured BEV (Bird-Eye-View) based fusion, which effectively unifies both LiDAR point clouds and camera images in a shared BEV space. Nevertheless, it is not trivial to perform camera-to-BEV transformation due to the inherently ambiguous depth estimation of each pixel, resulting in spatial misalignment between these two multi-modal features. Moreover, such transformation also inevitably leads to projection distortion of camera image features in BEV space. In this paper, we propose a novel Object-centric Fusion (ObjectFusion) paradigm, which completely gets rid of camera-to-BEV transformation during fusion to align object-centric features across different modalities for 3D object detection. ObjectFusion first learns three kinds of modality-specific feature maps (i.e., voxel, BEV, and image features) from LiDAR point clouds and its BEV projections, camera images. Then a set of 3D object proposals are produced from the BEV features via a heatmap-based proposal generator. Next, the 3D object proposals are reprojected back to voxel, BEV, and image spaces. We leverage voxel and RoI pooling to generate spatially aligned object-centric features for each modality. All the object-centric features of three modalities are further fused at object level, which is finally fed into the detection heads. Extensive experiments on nuScenes dataset demonstrate the superiority of our ObjectFusion, by achieving 69.8% mAP on nuScenes validation set and improving BEVFusion by 1.3%. 2023-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8306 https://ink.library.smu.edu.sg/context/sis_research/article/9309/viewcontent/Cai_ObjectFusion_Multi_modal_3D_Object_Detection_with_Object_Centric_Fusion_ICCV_2023_paper.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University 3D object detection Multi-modal Fusion-based approach Artificial Intelligence and Robotics Robotics
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	3D object detection Multi-modal Fusion-based approach Artificial Intelligence and Robotics Robotics
spellingShingle	3D object detection Multi-modal Fusion-based approach Artificial Intelligence and Robotics Robotics CAI, Q. PAN, Y. YAO, T. NGO, Chong-wah MEI, T. ObjectFusion: Multi-modal 3D object detection with object-centric fusion
description	Recent progress on multi-modal 3D object detection has featured BEV (Bird-Eye-View) based fusion, which effectively unifies both LiDAR point clouds and camera images in a shared BEV space. Nevertheless, it is not trivial to perform camera-to-BEV transformation due to the inherently ambiguous depth estimation of each pixel, resulting in spatial misalignment between these two multi-modal features. Moreover, such transformation also inevitably leads to projection distortion of camera image features in BEV space. In this paper, we propose a novel Object-centric Fusion (ObjectFusion) paradigm, which completely gets rid of camera-to-BEV transformation during fusion to align object-centric features across different modalities for 3D object detection. ObjectFusion first learns three kinds of modality-specific feature maps (i.e., voxel, BEV, and image features) from LiDAR point clouds and its BEV projections, camera images. Then a set of 3D object proposals are produced from the BEV features via a heatmap-based proposal generator. Next, the 3D object proposals are reprojected back to voxel, BEV, and image spaces. We leverage voxel and RoI pooling to generate spatially aligned object-centric features for each modality. All the object-centric features of three modalities are further fused at object level, which is finally fed into the detection heads. Extensive experiments on nuScenes dataset demonstrate the superiority of our ObjectFusion, by achieving 69.8% mAP on nuScenes validation set and improving BEVFusion by 1.3%.
format	text
author	CAI, Q. PAN, Y. YAO, T. NGO, Chong-wah MEI, T.
author_facet	CAI, Q. PAN, Y. YAO, T. NGO, Chong-wah MEI, T.
author_sort	CAI, Q.
title	ObjectFusion: Multi-modal 3D object detection with object-centric fusion
title_short	ObjectFusion: Multi-modal 3D object detection with object-centric fusion
title_full	ObjectFusion: Multi-modal 3D object detection with object-centric fusion
title_fullStr	ObjectFusion: Multi-modal 3D object detection with object-centric fusion
title_full_unstemmed	ObjectFusion: Multi-modal 3D object detection with object-centric fusion
title_sort	objectfusion: multi-modal 3d object detection with object-centric fusion
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/sis_research/8306 https://ink.library.smu.edu.sg/context/sis_research/article/9309/viewcontent/Cai_ObjectFusion_Multi_modal_3D_Object_Detection_with_Object_Centric_Fusion_ICCV_2023_paper.pdf
_version_	1784855627891212288

ObjectFusion: Multi-modal 3D object detection with object-centric fusion

Similar Items