内核提案网络用于任意形状的文本检测

论文标题

内核提案网络用于任意形状的文本检测

Kernel Proposal Network for Arbitrary Shape Text Detection

论文作者

Zhang, Shi-Xue, Zhu, Xiaobin, Hou, Jie-Bo, Yang, Chun, Yin, Xu-Cheng

论文摘要

基于分割的方法在任意形状文本检测方面取得了巨大的成功。但是，由于场景图像中文本的复杂性，将相邻文本实例分开仍然是最具挑战性的问题之一。在本文中，我们提出了一个创新的内核提案网络（称为KPN），以进行任意形状的文本检测。提出的KPN可以通过将不同的文本分类为独立于实例的特征图，可以避免基于分割的任意形状文本检测方法中存在的复杂聚合过程，从而将相邻的文本实例分开。为了具体，我们的KPN将为每个文本图像预测一个高斯中心地图，该图像将用于根据其相应的关键点位置从嵌入特征图中提取一系列候选内核建议（即动态卷积内核）。为了实施内核提案之间的独立性，我们通过正交约束提出了一种新颖的正交学习损失（OLL）。具体而言，我们的内核建议包含通过嵌入网络和位置信息学到的重要自我信息。最后，内核提案将单独浏览所有嵌入特征图，以生成单个文本实例的嵌入图。这样，我们的KPN可以有效地将相邻的文本实例分开，并提高针对不清楚边界的鲁棒性。据我们所知，我们的工作是第一个引入动态卷积内核策略，以有效地解决文本检测中相邻文本实例的附着问题。有关挑战数据集的实验结果验证了我们方法的令人印象深刻的性能和效率。代码和模型可在https://github.com/gxym/kpn上找到。

Segmentation-based methods have achieved great success for arbitrary shape text detection. However, separating neighboring text instances is still one of the most challenging problems due to the complexity of texts in scene images. In this paper, we propose an innovative Kernel Proposal Network (dubbed KPN) for arbitrary shape text detection. The proposed KPN can separate neighboring text instances by classifying different texts into instance-independent feature maps, meanwhile avoiding the complex aggregation process existing in segmentation-based arbitrary shape text detection methods. To be concrete, our KPN will predict a Gaussian center map for each text image, which will be used to extract a series of candidate kernel proposals (i.e., dynamic convolution kernel) from the embedding feature maps according to their corresponding keypoint positions. To enforce the independence between kernel proposals, we propose a novel orthogonal learning loss (OLL) via orthogonal constraints. Specifically, our kernel proposals contain important self-information learned by network and location information by position embedding. Finally, kernel proposals will individually convolve all embedding feature maps for generating individual embedded maps of text instances. In this way, our KPN can effectively separate neighboring text instances and improve the robustness against unclear boundaries. To our knowledge, our work is the first to introduce the dynamic convolution kernel strategy to efficiently and effectively tackle the adhesion problem of neighboring text instances in text detection. Experimental results on challenging datasets verify the impressive performance and efficiency of our method. The code and model are available at https://github.com/GXYM/KPN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题