论文标题
图像分类的空间细心输出层
Spatially Attentive Output Layer for Image Classification
论文作者
论文摘要
大多数用于图像分类的卷积神经网络(CNN)都使用全局平均池(GAP),然后使用完全连接的输出逻辑(FC)层。但是,这种空间聚合过程固有地限制了在输出层的特定位置信息的利用,尽管该空间信息可能对分类有益。在本文中,我们在现有的卷积特征图顶部提出了一个新颖的空间输出层,以明确利用特定于位置的输出信息。在特定的情况下,给定空间特征图,我们通过在空间逻辑上采用注意力掩码来替换先前的GAP-FC层用空间细心的输出层(SAOL)替换。提出的特定于位置的注意力选择性地汇总了目标区域内的空间逻辑,这不仅会改善性能,而且导致可解释的输出。此外,拟议的SAOL还允许完全利用特定于位置的自学意义以及自我介绍,以增强训练期间的概括能力。提议的带有自学和自distillation的SAOL可以轻松地插入现有的CNN中。具有代表性体系结构的各种分类任务的实验结果显示,SAOL的性能以几乎相同的计算成本进行了一致的改进。
Most convolutional neural networks (CNNs) for image classification use a global average pooling (GAP) followed by a fully-connected (FC) layer for output logits. However, this spatial aggregation procedure inherently restricts the utilization of location-specific information at the output layer, although this spatial information can be beneficial for classification. In this paper, we propose a novel spatial output layer on top of the existing convolutional feature maps to explicitly exploit the location-specific output information. In specific, given the spatial feature maps, we replace the previous GAP-FC layer with a spatially attentive output layer (SAOL) by employing a attention mask on spatial logits. The proposed location-specific attention selectively aggregates spatial logits within a target region, which leads to not only the performance improvement but also spatially interpretable outputs. Moreover, the proposed SAOL also permits to fully exploit location-specific self-supervision as well as self-distillation to enhance the generalization ability during training. The proposed SAOL with self-supervision and self-distillation can be easily plugged into existing CNNs. Experimental results on various classification tasks with representative architectures show consistent performance improvements by SAOL at almost the same computational cost.