图像分类的空间细心输出层

论文标题

图像分类的空间细心输出层

Spatially Attentive Output Layer for Image Classification

论文作者

Kim, Ildoo, Baek, Woonhyuk, Kim, Sungwoong

论文摘要

大多数用于图像分类的卷积神经网络（CNN）都使用全局平均池（GAP），然后使用完全连接的输出逻辑（FC）层。但是，这种空间聚合过程固有地限制了在输出层的特定位置信息的利用，尽管该空间信息可能对分类有益。在本文中，我们在现有的卷积特征图顶部提出了一个新颖的空间输出层，以明确利用特定于位置的输出信息。在特定的情况下，给定空间特征图，我们通过在空间逻辑上采用注意力掩码来替换先前的GAP-FC层用空间细心的输出层（SAOL）替换。提出的特定于位置的注意力选择性地汇总了目标区域内的空间逻辑，这不仅会改善性能，而且导致可解释的输出。此外，拟议的SAOL还允许完全利用特定于位置的自学意义以及自我介绍，以增强训练期间的概括能力。提议的带有自学和自distillation的SAOL可以轻松地插入现有的CNN中。具有代表性体系结构的各种分类任务的实验结果显示，SAOL的性能以几乎相同的计算成本进行了一致的改进。

Most convolutional neural networks (CNNs) for image classification use a global average pooling (GAP) followed by a fully-connected (FC) layer for output logits. However, this spatial aggregation procedure inherently restricts the utilization of location-specific information at the output layer, although this spatial information can be beneficial for classification. In this paper, we propose a novel spatial output layer on top of the existing convolutional feature maps to explicitly exploit the location-specific output information. In specific, given the spatial feature maps, we replace the previous GAP-FC layer with a spatially attentive output layer (SAOL) by employing a attention mask on spatial logits. The proposed location-specific attention selectively aggregates spatial logits within a target region, which leads to not only the performance improvement but also spatially interpretable outputs. Moreover, the proposed SAOL also permits to fully exploit location-specific self-supervision as well as self-distillation to enhance the generalization ability during training. The proposed SAOL with self-supervision and self-distillation can be easily plugged into existing CNNs. Experimental results on various classification tasks with representative architectures show consistent performance improvements by SAOL at almost the same computational cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题