多模式单词分类的计算采集模型

论文标题

多模式单词分类的计算采集模型

A Computational Acquisition Model for Multimodal Word Categorization

论文作者

Berger, Uri, Stanovsky, Gabriel, Abend, Omri, Frermann, Lea

论文摘要

文本和图像的自我监督建模的最新进展为儿童语言获取的计算模型打开了新的机会，据信这严重依赖于跨模式信号。但是，先前的研究受到了他们对在大型图像数据集中训练的视觉模型的限制，该数据集用预定义的对象类别进行了注释。这是（a）不忠于儿童收到的信息，并且（b）由于预先施加的类别结构而禁止对类别学习任务进行此类模型的评估。我们解决了这一差距，并提出了一种经认知启发的多模式采集模型，该模型使用跨模式的自学训练，从图像符合对象对训练了自然主义数据。我们表明该模型学习了单词类别和对象识别能力，并提出了让人联想到发展文献中报道的趋势。我们将代码和受过训练的模型公开以供将来参考和使用。

Recent advances in self-supervised modeling of text and images open new opportunities for computational models of child language acquisition, which is believed to rely heavily on cross-modal signals. However, prior studies have been limited by their reliance on vision models trained on large image datasets annotated with a pre-defined set of depicted object categories. This is (a) not faithful to the information children receive and (b) prohibits the evaluation of such models with respect to category learning tasks, due to the pre-imposed category structure. We address this gap, and present a cognitively-inspired, multimodal acquisition model, trained from image-caption pairs on naturalistic data using cross-modal self-supervision. We show that the model learns word categories and object recognition abilities, and presents trends reminiscent of those reported in the developmental literature. We make our code and trained models public for future reference and use.

下载PDF全文

下载文献需遵守相关版权规定

论文标题