交流：用于数学表达识别的卷积序列网络

论文标题

交流：用于数学表达识别的卷积序列网络

ConvMath: A Convolutional Sequence Network for Mathematical Expression Recognition

论文作者

Yan, Zuoyu, Zhang, Xiaode, Gao, Liangcai, Yuan, Ke, Tang, Zhi

论文摘要

尽管光学特征识别（OCR）最近取得了进步，但由于其二维图形布局，数学表达式仍然面临着一个巨大的挑战。在本文中，我们提出了一个卷积序列建模网络Condmath，该网络将图像中的数学表达式描述以端到端的方式转换为乳胶序列。该网络结合了用于特征提取的图像编码器和用于序列生成的卷积解码器。与其他基于其他长期内存（LSTM）的编码模型模型相比，Convmath完全基于卷积，因此可以易于执行并行计算。此外，网络还采用了解码器中的多层注意机制，该机制允许模型自动将输出符号与源特征向量相结合，并减轻训练模型时缺乏覆盖范围的问题。 Convmath的性能在一个名为IM2LATEX-100K的开放数据集上进行了评估，其中包括103556个样本。实验结果表明，所提出的网络可实现最新的准确性，并且比以前的方法更好。

Despite the recent advances in optical character recognition (OCR), mathematical expressions still face a great challenge to recognize due to their two-dimensional graphical layout. In this paper, we propose a convolutional sequence modeling network, ConvMath, which converts the mathematical expression description in an image into a LaTeX sequence in an end-to-end way. The network combines an image encoder for feature extraction and a convolutional decoder for sequence generation. Compared with other Long Short Term Memory(LSTM) based encoder-decoder models, ConvMath is entirely based on convolution, thus it is easy to perform parallel computation. Besides, the network adopts multi-layer attention mechanism in the decoder, which allows the model to align output symbols with source feature vectors automatically, and alleviates the problem of lacking coverage while training the model. The performance of ConvMath is evaluated on an open dataset named IM2LATEX-100K, including 103556 samples. The experimental results demonstrate that the proposed network achieves state-of-the-art accuracy and much better efficiency than previous methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题