Antcritic：自由形式和视觉丰富的财务评论的论点开采

论文标题

Antcritic：自由形式和视觉丰富的财务评论的论点开采

AntCritic: Argument Mining for Free-Form and Visually-Rich Financial Comments

论文作者

Liu, Huadai, Xu, Wenqiang, Lin, Xuan, Huo, Jingjing, Chen, Hong, Zhao, Zhou

论文摘要

参数挖掘旨在检测所有可能的论点组件并自动识别其关系。作为自然语言处理的一项繁荣任务，在该领域有大量的学术研究和应用程序发展。但是，该领域的研究仍然受现有数据集的固有局限性的限制。具体而言，所有公开可用的数据集的规模都相对较小，并且很少有人提供来自其他方式的信息来促进学习过程。此外，这些语料库中的陈述和表达通常是紧凑的形式，这限制了模型的概括能力。为此，我们收集了一个新颖的数据集Antcritic，可以作为对该领域的有用补充，该领域由约10K的自由形式和视觉上富裕的财务评论组成，并支持参数组件检测和参数关系预测任务。此外，为了应对场景扩展带来的挑战，我们彻底探讨了细粒度的关系预测和结构重建方案，并讨论了视觉样式和布局的编码机制。在此基础上，我们设计了两个简单但有效的模型体系结构，并在此数据集上进行了各种实验，以提供基准性能作为参考，并验证我们提出的架构的实用性。我们在此链接中发布数据和代码，该数据集遵循CC BY-NC-ND 4.0许可证。

Argument mining aims to detect all possible argumentative components and identify their relationships automatically. As a thriving task in natural language processing, there has been a large amount of corpus for academic study and application development in this field. However, the research in this area is still constrained by the inherent limitations of existing datasets. Specifically, all the publicly available datasets are relatively small in scale, and few of them provide information from other modalities to facilitate the learning process. Moreover, the statements and expressions in these corpora are usually in a compact form, which restricts the generalization ability of models. To this end, we collect a novel dataset AntCritic to serve as a helpful complement to this area, which consists of about 10k free-form and visually-rich financial comments and supports both argument component detection and argument relation prediction tasks. Besides, to cope with the challenges brought by scenario expansion, we thoroughly explore the fine-grained relation prediction and structure reconstruction scheme and discuss the encoding mechanism for visual styles and layouts. On this basis, we design two simple but effective model architectures and conduct various experiments on this dataset to provide benchmark performances as a reference and verify the practicability of our proposed architecture. We release our data and code in this link, and this dataset follows CC BY-NC-ND 4.0 license.

下载PDF全文

下载文献需遵守相关版权规定

论文标题