论文标题
使用谓词相似性的公正场景图生成
Unbiased Scene Graph Generation using Predicate Similarities
论文作者
论文摘要
场景图被广泛应用于计算机视觉中,作为图像中所示对象之间关系的图形表示。但是,由于长尾谓词分布引起的偏见训练,这些应用程序尚未达到开发的实际阶段。近年来,许多研究解决了这个问题。相反,相对较少的作品将谓词相似性视为独特的数据集特征,这也导致了偏见的预测。由于该特征,很少有谓词(例如,停放,覆盖)很容易被错误分类为密切相关的频繁谓词(例如,on,in)。利用谓词相似性,我们提出了一种新的分类方案,该方案将过程分支到相似谓词组的几个细粒分类器上。分类器旨在详细捕获相似谓词之间的差异。我们还介绍了转移学习的想法,以增强缺乏足够的培训样本来学习描述性表示的谓词的特征。视觉基因组数据集上的广泛实验的结果表明,我们方法的组合和现有的偏见方法大大提高了挑战SGCLS/SGDET任务的尾巴谓词的性能。但是,提出的方法的总体表现并未达到当前最新状态的状态,因此作为未来工作的进一步分析仍然是必要的。
Scene Graphs are widely applied in computer vision as a graphical representation of relationships between objects shown in images. However, these applications have not yet reached a practical stage of development owing to biased training caused by long-tailed predicate distributions. In recent years, many studies have tackled this problem. In contrast, relatively few works have considered predicate similarities as a unique dataset feature which also leads to the biased prediction. Due to the feature, infrequent predicates (e.g., parked on, covered in) are easily misclassified as closely-related frequent predicates (e.g., on, in). Utilizing predicate similarities, we propose a new classification scheme that branches the process to several fine-grained classifiers for similar predicate groups. The classifiers aim to capture the differences among similar predicates in detail. We also introduce the idea of transfer learning to enhance the features for the predicates which lack sufficient training samples to learn the descriptive representations. The results of extensive experiments on the Visual Genome dataset show that the combination of our method and an existing debiasing approach greatly improves performance on tail predicates in challenging SGCls/SGDet tasks. Nonetheless, the overall performance of the proposed approach does not reach that of the current state of the art, so further analysis remains necessary as future work.