论文标题
CM-NET:基于同心掩模的任意形状检测
CM-Net: Concentric Mask based Arbitrary-Shaped Text Detection
论文作者
论文摘要
最近,快速任意形状的文本检测已成为一个有吸引力的研究主题。但是,大多数现有的方法都是非实时时间,在智能系统中可能缺乏。尽管提出了一些实时文本方法,但检测准确性远远落后于非实时时间方法。为了同时提高检测准确性和速度,我们提出了一个新颖的快速准确的文本检测框架,即CM-NET,该框架是基于新的文本表示方法和一个多人特征(MPF)模块构建的。前者可以通过同心遮罩(CM)以有效且健壮的方式拟合任意形状的文字轮廓。后者鼓励网络从多个角度学习更多与CM相关的判别特征,并且没有额外的计算成本。为了使CM和MPF的优势受益,提出的CM-NET只需要预测文本实例的一个CM即可重建文本轮廓,并与以前的工作相比,在检测准确性和速度之间取得了最佳平衡。此外,为了确保有效地学习了多方面特征,提出了多因素约束损失。广泛的实验表明,所提出的CM具有有效且鲁棒性,可以拟合任意形状的文本实例,并验证MPF的有效性以及判别文本特征识别的约束损失。此外,实验结果表明,在MSRA-TD500,CTW1500,Total-Text和ICDAR2015数据集中,提出的CM-NET优于检测速度和准确性的现有最新的(SOTA)实时文本检测方法。
Recently fast arbitrary-shaped text detection has become an attractive research topic. However, most existing methods are non-real-time, which may fall short in intelligent systems. Although a few real-time text methods are proposed, the detection accuracy is far behind non-real-time methods. To improve the detection accuracy and speed simultaneously, we propose a novel fast and accurate text detection framework, namely CM-Net, which is constructed based on a new text representation method and a multi-perspective feature (MPF) module. The former can fit arbitrary-shaped text contours by concentric mask (CM) in an efficient and robust way. The latter encourages the network to learn more CM-related discriminative features from multiple perspectives and brings no extra computational cost. Benefiting the advantages of CM and MPF, the proposed CM-Net only needs to predict one CM of the text instance to rebuild the text contour and achieves the best balance between detection accuracy and speed compared with previous works. Moreover, to ensure that multi-perspective features are effectively learned, the multi-factor constraints loss is proposed. Extensive experiments demonstrate the proposed CM is efficient and robust to fit arbitrary-shaped text instances, and also validate the effectiveness of MPF and constraints loss for discriminative text features recognition. Furthermore, experimental results show that the proposed CM-Net is superior to existing state-of-the-art (SOTA) real-time text detection methods in both detection speed and accuracy on MSRA-TD500, CTW1500, Total-Text, and ICDAR2015 datasets.