论文标题
对比训练改善了半结构化文档的零击分类
Contrastive Training Improves Zero-Shot Classification of Semi-structured Documents
论文作者
论文摘要
我们在零拍设置中调查了半结构化文档分类。半结构化文档的分类比标准非结构化文档的分类更具挑战性,因为位置,布局和样式信息在解释此类文档中起着至关重要的作用。在培训和测试期间固定类别的标准分类设置在动态环境中可能会出现新的文档类别可能出现。我们专注于在新的看不见类中进行推理的零拍设置。为了解决此任务,我们提出了一种基于匹配的方法,该方法依赖于对训练和微调的成对对比目标。我们的结果表明,从监督和无监督的零摄影设置中,宏观f $ _1 $ _1 $的增长显着提高。
We investigate semi-structured document classification in a zero-shot setting. Classification of semi-structured documents is more challenging than that of standard unstructured documents, as positional, layout, and style information play a vital role in interpreting such documents. The standard classification setting where categories are fixed during both training and testing falls short in dynamic environments where new document categories could potentially emerge. We focus exclusively on the zero-shot setting where inference is done on new unseen classes. To address this task, we propose a matching-based approach that relies on a pairwise contrastive objective for both pretraining and fine-tuning. Our results show a significant boost in Macro F$_1$ from the proposed pretraining step in both supervised and unsupervised zero-shot settings.