DCT蒙版：离散余弦变换掩码表示例如分段

论文标题

DCT蒙版：离散余弦变换掩码表示例如分段

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

论文作者

Shen, Xing, Yang, Jirui, Wei, Chunbo, Deng, Bing, Huang, Jianqiang, Hua, Xiansheng, Cheng, Xiaoliang, Liang, Kewei

论文摘要

二进制网格掩码表示在实例分割中广泛使用。代表性的实例化是蒙版R-CNN，可以预测$ 28 \ times 28 $二进制网格的口罩。通常，低分辨率网格不足以捕获细节，而高分辨率网格大大提高了训练的复杂性。在本文中，我们通过应用离散的余弦变换（DCT）来将高分辨率二进制网格掩码编码为紧凑的向量来提出一个新的掩码表示形式。我们的方法称为DCT蒙版，可以轻松地集成到大多数基于像素的实例分割方法中。在没有任何铃铛和哨子的情况下，DCT面具可在不同的框架，骨干，数据集和训练时间表上获得显着收益。它不需要任何预处理或预训练，几乎没有对跑步速度的损害。特别是，对于高质量的注释和更复杂的骨干，我们的方法具有更大的改进。此外，我们从掩盖表示质量的角度分析了方法的性能。 DCT掩码效果很好的主要原因是它获得了低复杂性的高质量掩码表示。代码可在https://github.com/aliyun/dct-mask.git上找到。

Binary grid mask representation is broadly used in instance segmentation. A representative instantiation is Mask R-CNN which predicts masks on a $28\times 28$ binary grid. Generally, a low-resolution grid is not sufficient to capture the details, while a high-resolution grid dramatically increases the training complexity. In this paper, we propose a new mask representation by applying the discrete cosine transform(DCT) to encode the high-resolution binary grid mask into a compact vector. Our method, termed DCT-Mask, could be easily integrated into most pixel-based instance segmentation methods. Without any bells and whistles, DCT-Mask yields significant gains on different frameworks, backbones, datasets, and training schedules. It does not require any pre-processing or pre-training, and almost no harm to the running speed. Especially, for higher-quality annotations and more complex backbones, our method has a greater improvement. Moreover, we analyze the performance of our method from the perspective of the quality of mask representation. The main reason why DCT-Mask works well is that it obtains a high-quality mask representation with low complexity. Code is available at https://github.com/aliyun/DCT-Mask.git.

下载PDF全文

下载文献需遵守相关版权规定

论文标题