通过产生负面例子来检测荷兰幽默

论文标题

通过产生负面例子来检测荷兰幽默

Dutch Humor Detection by Generating Negative Examples

论文作者

Winters, Thomas, Delobelle, Pieter

论文摘要

检测文本是否幽默是一项很难进行计算的任务，因为它通常需要语言和常识见解。在机器学习中，幽默检测通常被建模为二进制分类任务，训练以预测给定文本是笑话还是其他类型的文本。我们建议使用文本生成算法模仿原始的笑话数据集来增加学习算法的难度，而不是使用完全不同的非幽默文本。我们构建了几个不同的笑话和非笑话数据集，以测试不同语言技术的幽默检测能力。特别是，我们将经典神经网络方法的幽默检测能力与最先进的荷兰语模型Robbert进行了比较。在此过程中，我们创建并比较了第一个荷兰幽默检测系统。我们发现，尽管非笑话来自完全不同的领域，但其他语言模型的表现良好，但罗伯特是唯一能够将笑话与产生的负面例子区分开的人。该性能说明了使用文本生成为幽默识别创建负面数据集的有用性，还表明变压器模型在幽默检测中是向前迈出的一大步。

Detecting if a text is humorous is a hard task to do computationally, as it usually requires linguistic and common sense insights. In machine learning, humor detection is usually modeled as a binary classification task, trained to predict if the given text is a joke or another type of text. Rather than using completely different non-humorous texts, we propose using text generation algorithms for imitating the original joke dataset to increase the difficulty for the learning algorithm. We constructed several different joke and non-joke datasets to test the humor detection abilities of different language technologies. In particular, we compare the humor detection capabilities of classic neural network approaches with the state-of-the-art Dutch language model RobBERT. In doing so, we create and compare the first Dutch humor detection systems. We found that while other language models perform well when the non-jokes came from completely different domains, RobBERT was the only one that was able to distinguish jokes from generated negative examples. This performance illustrates the usefulness of using text generation to create negative datasets for humor recognition, and also shows that transformer models are a large step forward in humor detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题