可逆列网络

论文标题

Reversible Column Networks

论文作者

Cai, Yuxuan, Zhou, Yizhuang, Han, Qi, Sun, Jianjian, Kong, Xiangwen, Li, Jun, Zhang, Xiangyu

论文摘要

我们提出了一个新的神经网络设计范式可逆列网络（REVCOL）。 REVCOL的主体分别由子网的多个副本组成，分别为列，在这些列之间采用了多级可逆连接。这种体系结构方案属性REVCOL与常规网络截然不同：在向前传播期间，RevCol中的特征在通过每列时会逐渐散布，而每列的总信息的总体信息被维护而不是像其他网络一样压缩或丢弃。我们的实验表明，CNN风格的REVCOL模型可以在多个计算机视觉任务（例如图像分类，对象检测和语义细分）上实现非常有竞争力的性能，尤其是在大型参数预算和大数据集的情况下。例如，在ImagEnet-22K预训练后，REVCOL-XL获得了88.2％的Imagenet-1K精度。鉴于更多的预训练数据，我们最大的型号REVCOL-H在Imagenet-1K上达到90.0％，可可检测当作Manival Set的APBOX为63.8％，在ADE20K细分方面为61.0％MIOU。据我们所知，这是纯（静态）CNN模型中最佳的可可检测和ADE20K分割结果。此外，作为一般的宏观体系结构时尚，REVCOL也可以引入变形金刚或其他神经网络，这被证明可以改善计算机视觉和NLP任务中的性能。我们在https://github.com/megvii-research/revcol上发布代码和模型

We propose a new neural network design paradigm Reversible Column Network (RevCol). The main body of RevCol is composed of multiple copies of subnetworks, named columns respectively, between which multi-level reversible connections are employed. Such architectural scheme attributes RevCol very different behavior from conventional networks: during forward propagation, features in RevCol are learned to be gradually disentangled when passing through each column, whose total information is maintained rather than compressed or discarded as other network does. Our experiments suggest that CNN-style RevCol models can achieve very competitive performances on multiple computer vision tasks such as image classification, object detection and semantic segmentation, especially with large parameter budget and large dataset. For example, after ImageNet-22K pre-training, RevCol-XL obtains 88.2% ImageNet-1K accuracy. Given more pre-training data, our largest model RevCol-H reaches 90.0% on ImageNet-1K, 63.8% APbox on COCO detection minival set, 61.0% mIoU on ADE20k segmentation. To our knowledge, it is the best COCO detection and ADE20k segmentation result among pure (static) CNN models. Moreover, as a general macro architecture fashion, RevCol can also be introduced into transformers or other neural networks, which is demonstrated to improve the performances in both computer vision and NLP tasks. We release code and models at https://github.com/megvii-research/RevCol

下载PDF全文

下载文献需遵守相关版权规定

论文标题