轨迹余额：改善GFLOWNETS的信用分配

论文标题

轨迹余额：改善GFLOWNETS的信用分配

Trajectory balance: Improved credit assignment in GFlowNets

论文作者

Malkin, Nikolay, Jain, Moksh, Bengio, Emmanuel, Sun, Chen, Bengio, Yoshua

论文摘要

生成流动网络（GFLOWNETS）是一种学习从给定的未正常密度的动作序列的随机策略，用于生成组成对象（例如图形或字符串），其中许多可能的动作序列可能导致同一对象。我们发现，以前提出的针对Gflownets，流量匹配和详细平衡的学习目标，类似于时间差异学习，容易发生长期动作序列效率低下的信用传播。因此，我们为Gflownets，轨迹平衡提出了一个新的学习目标，作为先前使用的目标的更有效替代方案。我们证明，轨迹平衡目标的任何全球最小化器都可以定义一个完全从目标分布中采样的策略。在四个不同领域的实验中，我们从经验上证明了轨迹平衡目标的益处，对Gflownet收敛，生成的样品的多样性以及对长作用序列和大动作空间的鲁棒性。

Generative flow networks (GFlowNets) are a method for learning a stochastic policy for generating compositional objects, such as graphs or strings, from a given unnormalized density by sequences of actions, where many possible action sequences may lead to the same object. We find previously proposed learning objectives for GFlowNets, flow matching and detailed balance, which are analogous to temporal difference learning, to be prone to inefficient credit propagation across long action sequences. We thus propose a new learning objective for GFlowNets, trajectory balance, as a more efficient alternative to previously used objectives. We prove that any global minimizer of the trajectory balance objective can define a policy that samples exactly from the target distribution. In experiments on four distinct domains, we empirically demonstrate the benefits of the trajectory balance objective for GFlowNet convergence, diversity of generated samples, and robustness to long action sequences and large action spaces.

下载PDF全文

下载文献需遵守相关版权规定

论文标题