与基于EEG的隐式人类反馈一起加速加强学习代理

论文标题

与基于EEG的隐式人类反馈一起加速加强学习代理

Accelerating Reinforcement Learning Agent with EEG-based Implicit Human Feedback

论文作者

Xu, Duo, Agarwal, Mohit, Gupta, Ekansh, Fekri, Faramarz, Sivakumar, Raghupathy

论文摘要

提供强化学习（RL）代理人的反馈可以极大地改善学习的各个方面。但是，以前的方法要求人类观察者明确地给出输入（例如，按钮，语音界面），在RL代理的学习过程的循环中为人提供负担。此外，有时很难或不可能获得明确的人类建议（反馈），例如自主驾驶，残疾人康复等。在这项工作中，我们研究以与错误相关的电位（ERRP）的形式通过EEG捕获人的内在反应，从而为人类的自然和直接提高人类的求助方法，从而通过EEG进行了反馈。因此，可以通过隐式反馈与RL算法进行整合，以加速RL代理的学习。我们开发了三个相当复杂的2D离散导航游戏，以实验评估拟议工作的整体绩效。我们工作的主要贡献如下， (i) we propose and experimentally validate the zero-shot learning of ErrPs, where the ErrPs can be learned for one game, and transferred to other unseen games, (ii) we propose a novel RL framework for integrating implicit human feedbacks via ErrPs with RL agent, improving the label efficiency and robustness to human mistakes, and (iii) compared to prior works, we scale the application of ErrPs to reasonably complex environments, and demonstrate the significance我们通过实际用户实验加速学习的方法。

Providing Reinforcement Learning (RL) agents with human feedback can dramatically improve various aspects of learning. However, previous methods require human observer to give inputs explicitly (e.g., press buttons, voice interface), burdening the human in the loop of RL agent's learning process. Further, it is sometimes difficult or impossible to obtain the explicit human advise (feedback), e.g., autonomous driving, disabled rehabilitation, etc. In this work, we investigate capturing human's intrinsic reactions as implicit (and natural) feedback through EEG in the form of error-related potentials (ErrP), providing a natural and direct way for humans to improve the RL agent learning. As such, the human intelligence can be integrated via implicit feedback with RL algorithms to accelerate the learning of RL agent. We develop three reasonably complex 2D discrete navigational games to experimentally evaluate the overall performance of the proposed work. Major contributions of our work are as follows, (i) we propose and experimentally validate the zero-shot learning of ErrPs, where the ErrPs can be learned for one game, and transferred to other unseen games, (ii) we propose a novel RL framework for integrating implicit human feedbacks via ErrPs with RL agent, improving the label efficiency and robustness to human mistakes, and (iii) compared to prior works, we scale the application of ErrPs to reasonably complex environments, and demonstrate the significance of our approach for accelerated learning through real user experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题