论文标题

发现感兴趣的数学对象 - 数学符号的研究

Discovering Mathematical Objects of Interest -- A Study of Mathematical Notations

论文作者

Greiner-Petter, Andre, Schubotz, Moritz, Mueller, Fabian, Breitinger, Corinna, Cohl, Howard S., Aizawa, Akiko, Gipp, Bela

论文摘要

数学符号,即用于传达数学概念的写作系统,为各种信息搜索和检索系统编码有价值的信息。但是,数学符号主要没有被当今的系统未限制。在本文中,我们介绍了关于在两个大型科学语料库中数学符号分布的第一项深入研究:开放访问ARXIV(2.5B数学对象)和用于纯数学和应用数学ZBMATH(61M数学对象)的数学审查服务。我们的研究为大型科学语料库的数学信息检索未来的研究项目奠定了基础。此外,我们证明了结果与各种用例的相关性。例如,协助语义提取系统,改善科学搜索引擎并促进专业的数学推荐系统。我们提出的研究的贡献如下:(1)我们对Arxiv和Zbmath上的数学公式进行了首次分布分析; (2)我们为给定的文本搜索查询检索相关的数学对象(例如,链接$ p_ {n}^{(α,β)} \!\ left(x \ right)$与'jacobi polynomial''); (3)我们通过提供相关的数学公式来扩展ZBMATH的搜索引擎; (4)我们通过将自动完成数学输入作为对数学推荐系统的首次贡献来体现结果的适用性。为了加快未来的研究项目,我们提供了我们的源代码和数据。

Mathematical notation, i.e., the writing system used to communicate concepts in mathematics, encodes valuable information for a variety of information search and retrieval systems. Yet, mathematical notations remain mostly unutilized by today's systems. In this paper, we present the first in-depth study on the distributions of mathematical notation in two large scientific corpora: the open access arXiv (2.5B mathematical objects) and the mathematical reviewing service for pure and applied mathematics zbMATH (61M mathematical objects). Our study lays a foundation for future research projects on mathematical information retrieval for large scientific corpora. Further, we demonstrate the relevance of our results to a variety of use-cases. For example, to assist semantic extraction systems, to improve scientific search engines, and to facilitate specialized math recommendation systems. The contributions of our presented research are as follows: (1) we present the first distributional analysis of mathematical formulae on arXiv and zbMATH; (2) we retrieve relevant mathematical objects for given textual search queries (e.g., linking $P_{n}^{(α, β)}\!\left(x\right)$ with `Jacobi polynomial'); (3) we extend zbMATH's search engine by providing relevant mathematical formulae; and (4) we exemplify the applicability of the results by presenting auto-completion for math inputs as the first contribution to math recommendation systems. To expedite future research projects, we have made available our source code and data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源