论文标题
通过以人为驱动的动态数据集的增强来改善行为克隆
Improving Behavioural Cloning with Human-Driven Dynamic Dataset Augmentation
论文作者
论文摘要
行为克隆已被广泛用于训练代理,并被认为是一种基于专家轨迹的一般行为的快速,坚实的方法。这种方法遵循监督的学习范式,这在很大程度上取决于数据的分布。在我们的论文中,我们展示了将行为克隆与人类训练训练相结合的方法,可以解决其一些缺陷,并提供特定于任务的校正,以克服棘手的情况,同时加快训练时间并降低所需的资源。为此,我们介绍了一种新颖的方法,该方法使专家可以在模拟过程中的任何时刻控制代理,并为其有问题的情况提供最佳的解决方案。我们的实验表明,这种方法在定量评估和人类类似方面都可以提出更好的政策。
Behavioural cloning has been extensively used to train agents and is recognized as a fast and solid approach to teach general behaviours based on expert trajectories. Such method follows the supervised learning paradigm and it strongly depends on the distribution of the data. In our paper, we show how combining behavioural cloning with human-in-the-loop training solves some of its flaws and provides an agent task-specific corrections to overcome tricky situations while speeding up the training time and lowering the required resources. To do this, we introduce a novel approach that allows an expert to take control of the agent at any moment during a simulation and provide optimal solutions to its problematic situations. Our experiments show that this approach leads to better policies both in terms of quantitative evaluation and in human-likeliness.