通过点监督弱监督的视频显着对象检测

论文标题

通过点监督弱监督的视频显着对象检测

Weakly Supervised Video Salient Object Detection via Point Supervision

论文作者

Gao, Shuyong, Xing, Haozhe, Zhang, Wei, Wang, Yan, Guo, Qianyu, Zhang, Wenqiang

论文摘要

视频显着对象检测模型在像素密度注释上训练有素的训练，已经达到了出色的性能，但获得像素逐个像素注释的数据集很费力。尚未探索几项作品，试图使用涂鸦注释来减轻此问题，但是尚未探索点监督作为一种更节省劳动的注释方法（即使是对密集预测的手动注释方法中最多的劳动方法）。在本文中，我们提出了一个基于点监督的强基线模型。为了通过时间信息来推断显着性图，我们分别从短期和长期角度挖掘了框架间的互补信息。具体而言，我们提出了一个混合令牌注意模块，该模块将光流和图像信息从正交方向混合在一起，自适应地突出了关键的光流信息（通道维度）和关键令牌信息（空间维度）。为了利用长期提示，我们开发了长期的跨框架注意模块（LCFA），该模块有助于当前框架基于多框架代币推断出显着对象。此外，我们通过重新标记Davis和DavSod数据集来标记两个分配的数据集P-Davis和P-Davsod。六个基准数据集的实验说明了我们的方法优于先前的最新监督方法，甚至与某些完全监督的方法相媲美。源代码和数据集可用。

Video salient object detection models trained on pixel-wise dense annotation have achieved excellent performance, yet obtaining pixel-by-pixel annotated datasets is laborious. Several works attempt to use scribble annotations to mitigate this problem, but point supervision as a more labor-saving annotation method (even the most labor-saving method among manual annotation methods for dense prediction), has not been explored. In this paper, we propose a strong baseline model based on point supervision. To infer saliency maps with temporal information, we mine inter-frame complementary information from short-term and long-term perspectives, respectively. Specifically, we propose a hybrid token attention module, which mixes optical flow and image information from orthogonal directions, adaptively highlighting critical optical flow information (channel dimension) and critical token information (spatial dimension). To exploit long-term cues, we develop the Long-term Cross-Frame Attention module (LCFA), which assists the current frame in inferring salient objects based on multi-frame tokens. Furthermore, we label two point-supervised datasets, P-DAVIS and P-DAVSOD, by relabeling the DAVIS and the DAVSOD dataset. Experiments on the six benchmark datasets illustrate our method outperforms the previous state-of-the-art weakly supervised methods and even is comparable with some fully supervised approaches. Source code and datasets are available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题