深入的强化学习告诉我们有关人类运动学习和反之亦然的

论文标题

深入的强化学习告诉我们有关人类运动学习和反之亦然的

What deep reinforcement learning tells us about human motor learning and vice-versa

论文作者

Garibbo, Michele, Ludwig, Casimir, Lepora, Nathan, Aitchison, Laurence

论文摘要

机器学习，特别是强化学习（RL）在帮助我们了解神经决策过程方面非常成功。但是，RL在理解其他神经过程（尤其是运动学习）方面的作用探讨了得多。为了探索这种联系，我们研究了最近的深度RL方法与基于错误的学习神经科学中的主要运动学习框架相对应。可以使用镜面逆转适应范式探测基于错误的学习，在该范式中，它产生了在人类中观察到的独特定性预测。因此，我们在镜面逆向上测试了现代深度RL算法的三个主要家庭。令人惊讶的是，所有算法都无法模仿人类的行为，并且确实与基于错误的学习预测的行为在质量上显示出不同的行为。为了填补这一空白，我们引入了一种新颖的深度RL算法：基于模型的确定性策略梯度（MB-DPG）。 MB-DPG通过明确依靠观察到的动作结果来汲取灵感。我们在镜像 - 反转和旋转扰动下显示MB-DPG捕获（人）基于错误的学习。接下来，我们以MB-DPG的形式展示了基于错误的学习，比基于复杂的ARM的到达任务的规范无模型算法更快，同时比基于模型的RL更适合（向前）模型错误。这些发现突出了当前的深度RL方法与人类运动适应性之间的差距，并为缩小差距提供了一条途径，从而促进了两个领域之间未来的有益相互作用。

Machine learning and specifically reinforcement learning (RL) has been extremely successful in helping us to understand neural decision making processes. However, RL's role in understanding other neural processes especially motor learning is much less well explored. To explore this connection, we investigated how recent deep RL methods correspond to the dominant motor learning framework in neuroscience, error-based learning. Error-based learning can be probed using a mirror reversal adaptation paradigm, where it produces distinctive qualitative predictions that are observed in humans. We therefore tested three major families of modern deep RL algorithm on a mirror reversal perturbation. Surprisingly, all of the algorithms failed to mimic human behaviour and indeed displayed qualitatively different behaviour from that predicted by error-based learning. To fill this gap, we introduce a novel deep RL algorithm: model-based deterministic policy gradients (MB-DPG). MB-DPG draws inspiration from error-based learning by explicitly relying on the observed outcome of actions. We show MB-DPG captures (human) error-based learning under mirror-reversal and rotational perturbation. Next, we demonstrate error-based learning in the form of MB-DPG learns faster than canonical model-free algorithms on complex arm-based reaching tasks, while being more robust to (forward) model misspecification than model-based RL. These findings highlight the gap between current deep RL methods and human motor adaptation and offer a route to closing this gap, facilitating future beneficial interaction between between the two fields.

下载PDF全文

下载文献需遵守相关版权规定

论文标题