论文标题
通过自动产生的反事实在文本分类中对伪造的相关性的稳健性
Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals
论文作者
论文摘要
虚假的相关性威胁着统计分类器的有效性。虽然当测试数据来自与培训数据相同的分布时,模型精度可能会显得很高,但当测试分布更改时,它可能会迅速降解。例如,已经表明,当人类对更改示例的标签进行较小的修改时,分类器的性能很差。提高模型可靠性和普遍性的一种解决方案是确定特征和类之间的因果关系。在本文中,我们建议通过使用自动生成的反事实数据来增强培训数据来培训强大的文本分类器。我们首先使用统计匹配方法确定可能的因果特征。接下来,我们通过将因果特征替换为反义词,然后将相反的标签分配给反事实样本,从而为原始培训数据生成反事实样本。最后,我们将原始数据和反事实数据组合在一起以训练强大的分类器。两项分类任务的实验表明,对原始数据训练的传统分类器在人类生成的反事实样本上的作用非常低(例如,准确性下降了10%-37%)。但是,对组合数据训练的分类器更强大,并且在原始测试数据和反事实测试数据上都表现良好(例如,与传统分类器相比,准确度增加了12%-25%)。详细的分析表明,强大的分类器通过强调因果特征并取消强调非因果特征来做出有意义和值得信赖的预测。
Spurious correlations threaten the validity of statistical classifiers. While model accuracy may appear high when the test data is from the same distribution as the training data, it can quickly degrade when the test distribution changes. For example, it has been shown that classifiers perform poorly when humans make minor modifications to change the label of an example. One solution to increase model reliability and generalizability is to identify causal associations between features and classes. In this paper, we propose to train a robust text classifier by augmenting the training data with automatically generated counterfactual data. We first identify likely causal features using a statistical matching approach. Next, we generate counterfactual samples for the original training data by substituting causal features with their antonyms and then assigning opposite labels to the counterfactual samples. Finally, we combine the original data and counterfactual data to train a robust classifier. Experiments on two classification tasks show that a traditional classifier trained on the original data does very poorly on human-generated counterfactual samples (e.g., 10%-37% drop in accuracy). However, the classifier trained on the combined data is more robust and performs well on both the original test data and the counterfactual test data (e.g., 12%-25% increase in accuracy compared with the traditional classifier). Detailed analysis shows that the robust classifier makes meaningful and trustworthy predictions by emphasizing causal features and de-emphasizing non-causal features.