论文标题
致力于因果信贷分配
Towards Causal Credit Assignment
论文作者
论文摘要
根据他们的贡献,充分为未来成果的行动分配信贷是强化学习的长期开放挑战。最常用的信用分配方法的假设在决策效果并非立即明显的任务中是不利的。此外,此方法只能评估代理选择的动作,从而使其效率高。尽管如此,该领域仍未广泛采用其他方法。事后信贷分配是一个有前途但仍未开发的候选人,旨在解决长期和反事实信贷分配的问题。在本论文中,我们从经验上研究了事后信贷分配,以确定其主要收益,以及要改善的要点。然后,我们将其应用于基于环境因果结构的情况下,尤其是对状态表示形式。在这种情况下,我们提出了一种有效利用给定因果结构的事后信贷分配的变体。我们表明,我们的修改大大降低了事后信贷分配的工作量,使其更有效,并使其能够超过各种任务上的基线信用分配方法。这为基于给定或学到的因果结构的其他方法打开了道路。
Adequately assigning credit to actions for future outcomes based on their contributions is a long-standing open challenge in Reinforcement Learning. The assumptions of the most commonly used credit assignment method are disadvantageous in tasks where the effects of decisions are not immediately evident. Furthermore, this method can only evaluate actions that have been selected by the agent, making it highly inefficient. Still, no alternative methods have been widely adopted in the field. Hindsight Credit Assignment is a promising, but still unexplored candidate, which aims to solve the problems of both long-term and counterfactual credit assignment. In this thesis, we empirically investigate Hindsight Credit Assignment to identify its main benefits, and key points to improve. Then, we apply it to factored state representations, and in particular to state representations based on the causal structure of the environment. In this setting, we propose a variant of Hindsight Credit Assignment that effectively exploits a given causal structure. We show that our modification greatly decreases the workload of Hindsight Credit Assignment, making it more efficient and enabling it to outperform the baseline credit assignment method on various tasks. This opens the way to other methods based on given or learned causal structures.