自然语言处理中亵渎性的状态

论文标题

自然语言处理中亵渎性的状态

The State of Profanity Obfuscation in Natural Language Processing

论文作者

Nozza, Debora, Hovy, Dirk

论文摘要

关于仇恨言论的工作使科学出版物不可避免地考虑了粗鲁和有害的例子。这引发了各种问题，例如是否掩盖了亵渎性。尽管科学必须准确地披露它的作用，但仇恨言论的无根据，对读者有害，并增加了互联网频率。在保持出版物的专业外观的同时，使亵渎性混淆，评估内容尤其是对于非母语的人而言。对150篇ACL论文进行了调查，我们发现混淆通常用于英语，但不使用其他语言，甚至是不均匀的。我们讨论了混淆的问题，并提出了一种称为Prof的多语言社区资源，该资源具有一个用于标准化亵渎性混淆过程的Python模块。我们认为，教授可以帮助科学出版政策使仇恨言论的工作能够访问和可比，而与语言无关。

Work on hate speech has made the consideration of rude and harmful examples in scientific publications inevitable. This raises various problems, such as whether or not to obscure profanities. While science must accurately disclose what it does, the unwarranted spread of hate speech is harmful to readers, and increases its internet frequency. While maintaining publications' professional appearance, obfuscating profanities makes it challenging to evaluate the content, especially for non-native speakers. Surveying 150 ACL papers, we discovered that obfuscation is usually employed for English but not other languages, and even so quite uneven. We discuss the problems with obfuscation and suggest a multilingual community resource called PrOf that has a Python module to standardize profanity obfuscation processes. We believe PrOf can help scientific publication policies to make hate speech work accessible and comparable, irrespective of language.

下载PDF全文

下载文献需遵守相关版权规定

论文标题