论文标题

DAGSIM:将基于DAG的模型结构与不受约束的数据类型以及灵活,透明和模块化数据模拟的关系相结合

DagSim: Combining DAG-based model structure with unconstrained data types and relations for flexible, transparent, and modularized data simulation

论文作者

Hajj, Ghadi S. Al, Pensar, Johan, Sandve, Geir Kjetil

论文摘要

数据仿真对于机器学习和因果推断是基础,因为它允许探索场景和对地面真理完全控制的设置中的方法的评估。定向的无环图(DAG)已建立,用于编码推理和仿真设置中变量集合的依赖性结构。但是,尽管现代机器学习应用于日益复杂的性质数据,但基于DAG的仿真框架仍局限于具有相对简单的变量类型和功能形式的设置。我们在这里介绍Dagsim,这是一个基于Python的基于DAG数据模拟的框架,而对可变类型或功能关系的任何约束都没有任何约束。用于定义仿真模型结构的简洁YAML格式可提高透明度,而单独的用户提供的功能以基于其父母的父母生成每个变量,可确保模拟代码模块化。我们通过用例说明了dagsim的功能,在这种情况下,元数据变量控制图像中的形状和生物序列中的模式。

Data simulation is fundamental for machine learning and causal inference, as it allows exploration of scenarios and assessment of methods in settings with full control of ground truth. Directed acyclic graphs (DAGs) are well established for encoding the dependence structure over a collection of variables in both inference and simulation settings. However, while modern machine learning is applied to data of an increasingly complex nature, DAG-based simulation frameworks are still confined to settings with relatively simple variable types and functional forms. We here present DagSim, a Python-based framework for DAG-based data simulation without any constraints on variable types or functional relations. A succinct YAML format for defining the simulation model structure promotes transparency, while separate user-provided functions for generating each variable based on its parents ensure simulation code modularization. We illustrate the capabilities of DagSim through use cases where metadata variables control shapes in an image and patterns in bio-sequences.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源