学会使用前馈神经网络预测集合

论文标题

学会使用前馈神经网络预测集合

Learn to Predict Sets Using Feed-Forward Neural Networks

论文作者

Rezatofighi, Hamid, Zhu, Tianyu, Kaskman, Roman, Motlagh, Farbod T., Shi, Qinfeng, Milan, Anton, Cremers, Daniel, Leal-Taixé, Laura, Reid, Ian

论文摘要

本文介绍了使用深层馈送神经网络设置预测的任务。集合是在排列下不变的元素的集合，并且集合的大小未提前固定。许多现实世界中的问题，例如图像标记和对象检测，都具有自然表示为实体集的输出。这对传统的深层神经网络构成了挑战，该网络自然会处理媒介，矩阵或张量等结构化输出。我们提出了一种新的学习方法，用于使用深层神经网络预测未知置换和基数的套装。在我们的公式中，我们定义了由a）两个离散分布表示的设置分布的可能性，该分布是通过固定基础性的两个离散分布定义设置和置换变量的，b）与设定元素上的联合分布。根据所考虑的问题，我们定义了使用深层神经网络进行设置预测的不同培训模型。 We demonstrate the validity of our set formulations on relevant vision problems such as: 1) multi-label image classification where we outperform the other competing methods on the PASCAL VOC and MS COCO datasets, 2) object detection, for which our formulation outperforms popular state-of-the-art detectors, and 3) a complex CAPTCHA test, where we observe that, surprisingly, our set-based network acquired the ability of mimicking arithmetics without任何正在编码的规则。

This paper addresses the task of set prediction using deep feed-forward neural networks. A set is a collection of elements which is invariant under permutation and the size of a set is not fixed in advance. Many real-world problems, such as image tagging and object detection, have outputs that are naturally expressed as sets of entities. This creates a challenge for traditional deep neural networks which naturally deal with structured outputs such as vectors, matrices or tensors. We present a novel approach for learning to predict sets with unknown permutation and cardinality using deep neural networks. In our formulation we define a likelihood for a set distribution represented by a) two discrete distributions defining the set cardinally and permutation variables, and b) a joint distribution over set elements with a fixed cardinality. Depending on the problem under consideration, we define different training models for set prediction using deep neural networks. We demonstrate the validity of our set formulations on relevant vision problems such as: 1) multi-label image classification where we outperform the other competing methods on the PASCAL VOC and MS COCO datasets, 2) object detection, for which our formulation outperforms popular state-of-the-art detectors, and 3) a complex CAPTCHA test, where we observe that, surprisingly, our set-based network acquired the ability of mimicking arithmetics without any rules being coded.

下载PDF全文

下载文献需遵守相关版权规定

论文标题