论文标题

grounkernel3d:在3D稀疏CNN中扩展内核

LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs

论文作者

Chen, Yukang, Liu, Jianhui, Zhang, Xiangyu, Qi, Xiaojuan, Jia, Jiaya

论文摘要

2D CNNS的最新进展表明,大核很重要。但是,当在3D CNN中直接应用大卷积内核时,遇到了严重的困难,在2D中成功的模块设计在3D网络上令人惊讶地无效,包括流行的深度卷积。为了应对这一至关重要的挑战,我们提出了空间分区卷积及其大内核模块。结果,它避免了天真的3D大核的优化和效率问题。我们的大型内核3D CNN网络grounkernel3d在语义分割和对象检测的3D任务中得出了显着的改进。它在ScannETV2语义分段和72.8%的NDS nds nuscenes对象检测基准上实现了73.9%的MIOU,在Nuscenes LiDar Lifdar排行榜上排名第一。具有简单的多模式融合的性能将进一步提高到74.2%ND。此外,Waymo 3D对象检测上的grounkernel3d可以缩放到17x17x17内核大小。我们第一次表明,大型内核是可行的,对于3D视觉任务至关重要。

Recent advance in 2D CNNs has revealed that large kernels are important. However, when directly applying large convolutional kernels in 3D CNNs, severe difficulties are met, where those successful module designs in 2D become surprisingly ineffective on 3D networks, including the popular depth-wise convolution. To address this vital challenge, we instead propose the spatial-wise partition convolution and its large-kernel module. As a result, it avoids the optimization and efficiency issues of naive 3D large kernels. Our large-kernel 3D CNN network, LargeKernel3D, yields notable improvement in 3D tasks of semantic segmentation and object detection. It achieves 73.9% mIoU on the ScanNetv2 semantic segmentation and 72.8% NDS nuScenes object detection benchmarks, ranking 1st on the nuScenes LIDAR leaderboard. The performance further boosts to 74.2% NDS with a simple multi-modal fusion. In addition, LargeKernel3D can be scaled to 17x17x17 kernel size on Waymo 3D object detection. For the first time, we show that large kernels are feasible and essential for 3D visual tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源