论文标题
通过使用基于模型的条件独立性测试来提高PC算法的效率
Improving the Efficiency of the PC Algorithm by Using Model-Based Conditional Independence Tests
论文作者
论文摘要
学习因果结构在人工智能的许多领域都有用,包括计划,机器人技术和解释。基于约束的结构学习算法,例如PC使用条件独立性(CI)测试来推断因果结构。传统上,基于约束的算法执行CI测试,偏爱较小的调理集,部分是因为随着调节集的大小增加,常规CI测试的统计能力迅速下降。但是,许多现代有条件的独立性测试都是基于模型的,这些测试使用了良好的模型,即使使用非常庞大的调理集,也可以保持统计能力。这表明对基于约束的算法进行了有趣的新策略,这可能会导致执行的CI测试总数减少:测试变量对首先具有较大的调理组,作为一个预处理的步骤,以迅速找到某些有条件的独立性,然后才能继续进行更常规的策略,从而有利于小型条件集。我们为PC算法提出了这样的预处理步骤,该步骤依赖于在一些随机选择的大调节集上进行CI测试。我们对对应于现实世界系统以及ERDőS-RENYI DAG的经验和理论分析对应的定向无环图(DAG)进行经验分析。我们的结果表明,预处理加PC(P3PC)的CI测试要比原始PC算法少得多,在0.5%至36%之间,通常少于PC算法仅执行的CI测试中的CI测试。对于对应于现实世界系统的DAG,效率提高尤其重要。
Learning causal structure is useful in many areas of artificial intelligence, including planning, robotics, and explanation. Constraint-based structure learning algorithms such as PC use conditional independence (CI) tests to infer causal structure. Traditionally, constraint-based algorithms perform CI tests with a preference for smaller-sized conditioning sets, partially because the statistical power of conventional CI tests declines rapidly as the size of the conditioning set increases. However, many modern conditional independence tests are model-based, and these tests use well-regularized models that maintain statistical power even with very large conditioning sets. This suggests an intriguing new strategy for constraint-based algorithms which may result in a reduction of the total number of CI tests performed: Test variable pairs with large conditioning sets first, as a pre-processing step that finds some conditional independencies quickly, before moving on to the more conventional strategy that favors small conditioning sets. We propose such a pre-processing step for the PC algorithm which relies on performing CI tests on a few randomly selected large conditioning sets. We perform an empirical analysis on directed acyclic graphs (DAGs) that correspond to real-world systems and both empirical and theoretical analyses for Erdős-Renyi DAGs. Our results show that Pre-Processing Plus PC (P3PC) performs far fewer CI tests than the original PC algorithm, between 0.5% to 36%, and often less than 10%, of the CI tests that the PC algorithm alone performs. The efficiency gains are particularly significant for the DAGs corresponding to real-world systems.