估计白盒语言模型的个性

论文标题

估计白盒语言模型的个性

Estimating the Personality of White-Box Language Models

论文作者

Karra, Saketh Reddy, Nguyen, Son The, Tulabandhula, Theja

论文摘要

近年来，开放式语言生成的技术是人工智能的关键应用，在很大程度上已经发展了。大规模的语言模型在大型文本中接受培训，正在各地的广泛应用中，从虚拟助手到对话式机器人。尽管这些语言模型输出流利的文本，但现有研究表明，这些模型可以并且确实捕获了人类的偏见。这些偏见中的许多，尤其是可能造成伤害的偏见。另一方面，这些模型继承和改变人格特征的研究很少或不存在。我们的工作旨在通过探索用于开放式文本生成的几种大规模语言模型的人格特质以及用于培训它们的数据集的人格特质来解决这一差距。我们以流行的五个因素为基础，并开发出强大的方法来量化这些模型及其基础数据集的人格特质。特别是，我们使用针对人格评估设计的问卷触发模型，然后使用零摄影分类器将文本回答分类为可量化的性状。我们的估计计划阐明了此类AI模型中发现的重要拟人化元素，并可以帮助利益相关者决定应如何应用它们以及社会如何看待它们。此外，我们研究了改变这些个性的方法，从而增加了我们对AI模型如何适应特定环境的理解。

Technology for open-ended language generation, a key application of artificial intelligence, has advanced to a great extent in recent years. Large-scale language models, which are trained on large corpora of text, are being used in a wide range of applications everywhere, from virtual assistants to conversational bots. While these language models output fluent text, existing research shows that these models can and do capture human biases. Many of these biases, especially those that could potentially cause harm, are being well-investigated. On the other hand, studies that infer and change human personality traits inherited by these models have been scarce or non-existent. Our work seeks to address this gap by exploring the personality traits of several large-scale language models designed for open-ended text generation and the datasets used for training them. We build on the popular Big Five factors and develop robust methods that quantify the personality traits of these models and their underlying datasets. In particular, we trigger the models with a questionnaire designed for personality assessment and subsequently classify the text responses into quantifiable traits using a Zero-shot classifier. Our estimation scheme sheds light on an important anthropomorphic element found in such AI models and can help stakeholders decide how they should be applied as well as how society could perceive them. Additionally, we examined approaches to alter these personalities, adding to our understanding of how AI models can be adapted to specific contexts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题