论文标题
从聚类的角度来看时间戳监督的动作分割
Timestamp-Supervised Action Segmentation from the Perspective of Clustering
论文作者
论文摘要
由于注释成本较低,在时间戳监督下进行的视频动作细分最近受到了很多关注。大多数现有方法都为每个视频中的所有帧生成伪标记,以训练分割模型。但是,这些方法遭受了不正确的伪标记,尤其是对于两个连续动作之间的过渡区域中语义不清的帧,我们称之为模棱两可的间隔。为了解决这个问题,我们从聚类的角度提出了一个新颖的框架,其中包括以下两个部分。首先,伪标签结合产生不完整但高质量的伪标签序列,其中模棱两可的间隔中没有伪标记。其次,迭代聚类迭代地通过聚类将伪标记传播到模棱两可的间隔,从而更新伪标签序列以训练模型。我们进一步引入了聚类损失,该损失鼓励在同一动作段中更紧凑的框架特征。广泛的实验显示了我们方法的有效性。
Video action segmentation under timestamp supervision has recently received much attention due to lower annotation costs. Most existing methods generate pseudo-labels for all frames in each video to train the segmentation model. However, these methods suffer from incorrect pseudo-labels, especially for the semantically unclear frames in the transition region between two consecutive actions, which we call ambiguous intervals. To address this issue, we propose a novel framework from the perspective of clustering, which includes the following two parts. First, pseudo-label ensembling generates incomplete but high-quality pseudo-label sequences, where the frames in ambiguous intervals have no pseudo-labels. Second, iterative clustering iteratively propagates the pseudo-labels to the ambiguous intervals by clustering, and thus updates the pseudo-label sequences to train the model. We further introduce a clustering loss, which encourages the features of frames within the same action segment more compact. Extensive experiments show the effectiveness of our method.