论文标题
上下文化的感觉运动规范:在上下文中,歧义英语单词的感觉运动强度的多维度量
Contextualized Sensorimotor Norms: multi-dimensional measures of sensorimotor strength for ambiguous English words, in context
论文作者
论文摘要
大多数大型语言模型仅在语言输入上进行培训,但是人类似乎在感觉运动体验中对单词的理解为基础。一种自然的解决方案是通过人类对一个单词的感觉运动关联的判断(例如,兰开斯特感觉运动型规范)来增强LM表示,但这引起了另一个挑战:大多数单词是模棱两可的,并且孤立的单词的判断无法解决这种含义的多样性(例如,“ wooden table” vs。vs。“ vate” data Plate表“)。我们试图通过建立112个英语单词的上下文化感觉运动判断的新词汇资源来解决这个问题,每个词都在四种不同的上下文(总计448个句子)中进行了评分。我们表明,这些评分编码重叠但与兰开斯特感觉运动规范的不同信息,并且它们还预测了感兴趣的其他措施(例如相关性),超出了BERT的措施。除了阐明理论问题外,我们建议这些评级可以用作研究人员建立基础语言模型的“挑战集”。
Most large language models are trained on linguistic input alone, yet humans appear to ground their understanding of words in sensorimotor experience. A natural solution is to augment LM representations with human judgments of a word's sensorimotor associations (e.g., the Lancaster Sensorimotor Norms), but this raises another challenge: most words are ambiguous, and judgments of words in isolation fail to account for this multiplicity of meaning (e.g., "wooden table" vs. "data table"). We attempted to address this problem by building a new lexical resource of contextualized sensorimotor judgments for 112 English words, each rated in four different contexts (448 sentences total). We show that these ratings encode overlapping but distinct information from the Lancaster Sensorimotor Norms, and that they also predict other measures of interest (e.g., relatedness), above and beyond measures derived from BERT. Beyond shedding light on theoretical questions, we suggest that these ratings could be of use as a "challenge set" for researchers building grounded language models.