通过使用基于模型的条件独立性测试来提高PC算法的效率

论文标题

通过使用基于模型的条件独立性测试来提高PC算法的效率

Improving the Efficiency of the PC Algorithm by Using Model-Based Conditional Independence Tests

论文作者

Cai, Erica, McGregor, Andrew, Jensen, David

论文摘要

学习因果结构在人工智能的许多领域都有用，包括计划，机器人技术和解释。基于约束的结构学习算法，例如PC使用条件独立性（CI）测试来推断因果结构。传统上，基于约束的算法执行CI测试，偏爱较小的调理集，部分是因为随着调节集的大小增加，常规CI测试的统计能力迅速下降。但是，许多现代有条件的独立性测试都是基于模型的，这些测试使用了良好的模型，即使使用非常庞大的调理集，也可以保持统计能力。这表明对基于约束的算法进行了有趣的新策略，这可能会导致执行的CI测试总数减少：测试变量对首先具有较大的调理组，作为一个预处理的步骤，以迅速找到某些有条件的独立性，然后才能继续进行更常规的策略，从而有利于小型条件集。我们为PC算法提出了这样的预处理步骤，该步骤依赖于在一些随机选择的大调节集上进行CI测试。我们对对应于现实世界系统以及ERDőS-RENYI DAG的经验和理论分析对应的定向无环图（DAG）进行经验分析。我们的结果表明，预处理加PC（P3PC）的CI测试要比原始PC算法少得多，在0.5％至36％之间，通常少于PC算法仅执行的CI测试中的CI测试。对于对应于现实世界系统的DAG，效率提高尤其重要。

Learning causal structure is useful in many areas of artificial intelligence, including planning, robotics, and explanation. Constraint-based structure learning algorithms such as PC use conditional independence (CI) tests to infer causal structure. Traditionally, constraint-based algorithms perform CI tests with a preference for smaller-sized conditioning sets, partially because the statistical power of conventional CI tests declines rapidly as the size of the conditioning set increases. However, many modern conditional independence tests are model-based, and these tests use well-regularized models that maintain statistical power even with very large conditioning sets. This suggests an intriguing new strategy for constraint-based algorithms which may result in a reduction of the total number of CI tests performed: Test variable pairs with large conditioning sets first, as a pre-processing step that finds some conditional independencies quickly, before moving on to the more conventional strategy that favors small conditioning sets. We propose such a pre-processing step for the PC algorithm which relies on performing CI tests on a few randomly selected large conditioning sets. We perform an empirical analysis on directed acyclic graphs (DAGs) that correspond to real-world systems and both empirical and theoretical analyses for Erdős-Renyi DAGs. Our results show that Pre-Processing Plus PC (P3PC) performs far fewer CI tests than the original PC algorithm, between 0.5% to 36%, and often less than 10%, of the CI tests that the PC algorithm alone performs. The efficiency gains are particularly significant for the DAGs corresponding to real-world systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题