隐藏规则的游戏：机器学习的一种新型基准挑战

论文标题

隐藏规则的游戏：机器学习的一种新型基准挑战

The Game of Hidden Rules: A New Kind of Benchmark Challenge for Machine Learning

论文作者

Pulick, Eric, Bharti, Shubham, Chen, Yiding, Menkov, Vladimir, Mintz, Yonatan, Kantor, Paul, Bier, Vicki M.

论文摘要

随着机器学习（ML）更加紧密地编织到社会中，如果我们要负责任地使用它，我们必须更好地表征ML的优势和局限性。现有的ML基准环境（例如董事会和视频游戏）为进度提供了明确定义的基准测试，但是组成的任务通常很复杂，而且尚不清楚任务特征对机器学习者的总体难度有什么影响。同样，如果没有系统地评估任务特征如何影响难度，那么在不同基准环境中的性能之间建立有意义的联系是一项挑战。我们介绍了一个新颖的基准环境，该环境提供了大量的ML挑战，并可以精确检查任务要素如何影响实际难度。工具框架学习任务是“董事会清除游戏”，我们称之为“隐藏规则”游戏（GOHR）。环境包括一种表达性的规则语言和可以在本地安装的圈养服务器环境。我们建议一组基准的规则学习任务，并计划为有兴趣尝试学习规则的研究人员提供绩效领导者板。 GOHR通过允许对任务进行罚款，受控的修改来补充现有环境，使实验者能够更好地了解给定学习任务的每个方面如何有助于其对任意ML算法的实际困难。

As machine learning (ML) is more tightly woven into society, it is imperative that we better characterize ML's strengths and limitations if we are to employ it responsibly. Existing benchmark environments for ML, such as board and video games, offer well-defined benchmarks for progress, but constituent tasks are often complex, and it is frequently unclear how task characteristics contribute to overall difficulty for the machine learner. Likewise, without a systematic assessment of how task characteristics influence difficulty, it is challenging to draw meaningful connections between performance in different benchmark environments. We introduce a novel benchmark environment that offers an enormous range of ML challenges and enables precise examination of how task elements influence practical difficulty. The tool frames learning tasks as a "board-clearing game," which we call the Game of Hidden Rules (GOHR). The environment comprises an expressive rule language and a captive server environment that can be installed locally. We propose a set of benchmark rule-learning tasks and plan to support a performance leader-board for researchers interested in attempting to learn our rules. GOHR complements existing environments by allowing fine, controlled modifications to tasks, enabling experimenters to better understand how each facet of a given learning task contributes to its practical difficulty for an arbitrary ML algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题