论文标题
关于探测的数据要求
On the data requirements of probing
论文作者
论文摘要
随着大型和强大的神经语言模型的发展,研究人员越来越有兴趣开发诊断工具来探测它们。有许多论文,结论是“在模型y中找到观察x”的形式,使用了自己的数据集,其大小不同。较大的探测数据集带来了更多的可靠性,但收集也很昂贵。尚未有一种定量方法来估计合理的探测数据集大小。我们在比较两种探测配置的背景下解决了这一遗漏:从试点研究收集了一个小数据集后,多少其他数据样本足以区分两种不同的配置?我们提出了一种在此类实验中估算所需数据样本数量的新方法,在几个案例研究中,我们验证了我们的估计是否具有足够的统计能力。我们的框架有助于系统地构建探测数据集以诊断神经NLP模型。
As large and powerful neural language models are developed, researchers have been increasingly interested in developing diagnostic tools to probe them. There are many papers with conclusions of the form "observation X is found in model Y", using their own datasets with varying sizes. Larger probing datasets bring more reliability, but are also expensive to collect. There is yet to be a quantitative method for estimating reasonable probing dataset sizes. We tackle this omission in the context of comparing two probing configurations: after we have collected a small dataset from a pilot study, how many additional data samples are sufficient to distinguish two different configurations? We present a novel method to estimate the required number of data samples in such experiments and, across several case studies, we verify that our estimations have sufficient statistical power. Our framework helps to systematically construct probing datasets to diagnose neural NLP models.