论文标题
帕累托条件网络
Pareto Conditioned Networks
论文作者
论文摘要
在多目标优化中,学习所有达到帕累托效率解决方案的政策都是一个昂贵的过程。一组最佳策略可以随着目标的数量而成倍增长,并且恢复所有解决方案需要对整个状态空间进行详尽的探索。我们提出了帕累托条件网络(PCN),该方法使用单个神经网络涵盖所有非主导策略。 PCN将过去的每个过渡与其剧集的回归相关联。它训练网络,以至于在同一回报以相同的回报为条件时,它应该重新制定过渡。在这样做时,我们将优化问题转换为分类问题。我们通过根据所需的帕累托效率解决方案调节网络来恢复具体的策略。我们的方法是稳定的,因为它以监督的方式学习,从而避免了行动的目标问题。此外,通过使用单个网络,PCN与目标数量有效地缩放。最后,它对帕累托前沿的形状做出了最少的假设,这使其适合于以前的最先进的多目标增强算法更广泛的问题。
In multi-objective optimization, learning all the policies that reach Pareto-efficient solutions is an expensive process. The set of optimal policies can grow exponentially with the number of objectives, and recovering all solutions requires an exhaustive exploration of the entire state space. We propose Pareto Conditioned Networks (PCN), a method that uses a single neural network to encompass all non-dominated policies. PCN associates every past transition with its episode's return. It trains the network such that, when conditioned on this same return, it should reenact said transition. In doing so we transform the optimization problem into a classification problem. We recover a concrete policy by conditioning the network on the desired Pareto-efficient solution. Our method is stable as it learns in a supervised fashion, thus avoiding moving target issues. Moreover, by using a single network, PCN scales efficiently with the number of objectives. Finally, it makes minimal assumptions on the shape of the Pareto front, which makes it suitable to a wider range of problems than previous state-of-the-art multi-objective reinforcement learning algorithms.