论文标题
深度:使用图神经网络聚类的无监督分割
DeepCut: Unsupervised Segmentation using Graph Neural Networks Clustering
论文作者
论文摘要
图像分割是计算机视觉中的基本任务。培训监督方法的数据注释可以是劳动密集型的,无监督的方法。当前的方法通常依赖于从预训练的网络中提取深度特征来构建图形,然后将经典的聚类方法(例如K-均值和归一化切割)应用于后处理步骤。但是,这种方法减少了在配对标量亲和力中编码的高维信息。为了解决这一限制,本研究引入了轻量级图神经网络(GNN),以替换经典的聚类方法,同时优化相同的聚类目标函数。与现有方法不同,我们的GNN同时采用了本地图像功能和原始功能作为输入之间的成对亲和力。原始特征和聚类目标之间的这种直接连接使我们能够隐式地对不同图之间的簇进行分类,从而在不需要其他后处理步骤中进行部分语义分割。我们证明了如何将经典的聚类目标作为训练图像分割GNN的自我监督损失函数提出。此外,我们采用相关聚类(CC)目标来执行聚类,而无需定义簇的数量,从而允许无K-无聚类。我们将提出的方法应用于对象定位,细分和语义部分分割任务,超过了多个基准测试的最新性能。
Image segmentation is a fundamental task in computer vision. Data annotation for training supervised methods can be labor-intensive, motivating unsupervised methods. Current approaches often rely on extracting deep features from pre-trained networks to construct a graph, and classical clustering methods like k-means and normalized-cuts are then applied as a post-processing step. However, this approach reduces the high-dimensional information encoded in the features to pair-wise scalar affinities. To address this limitation, this study introduces a lightweight Graph Neural Network (GNN) to replace classical clustering methods while optimizing for the same clustering objective function. Unlike existing methods, our GNN takes both the pair-wise affinities between local image features and the raw features as input. This direct connection between the raw features and the clustering objective enables us to implicitly perform classification of the clusters between different graphs, resulting in part semantic segmentation without the need for additional post-processing steps. We demonstrate how classical clustering objectives can be formulated as self-supervised loss functions for training an image segmentation GNN. Furthermore, we employ the Correlation-Clustering (CC) objective to perform clustering without defining the number of clusters, allowing for k-less clustering. We apply the proposed method for object localization, segmentation, and semantic part segmentation tasks, surpassing state-of-the-art performance on multiple benchmarks.