学习潜在的代表，以共同适应人类

论文标题

学习潜在的代表，以共同适应人类

Learning Latent Representations to Co-Adapt to Humans

论文作者

Parekh, Sagar, Losey, Dylan P.

论文摘要

当机器人在房屋，道路或工厂中与人类互动时，人类的行为通常会随着机器人的响应而变化。非平稳的人类对机器人学习者的挑战：机器人学会与原始人协调的行动可能在人类适应机器人后可能失败。在本文中，我们介绍了一种算法形式主义，该算法使机器人（即自我代理人）仅使用机器人的低级状态，动作和奖励，与动态人（即其他代理人）并肩作用。核心挑战是人类不仅对机器人的行为做出反应，而且人类反应不可避免地随着时间的流逝和用户之间的反应。为了应对这一挑战，我们的见解是， - 而不是建立人类的确切模型，而是可以通过对人类政策和政策动态的高级表示来学习和推理。应用这种见解，我们发展了RILI：强烈影响潜在意图。 RILI首先将低级机器人观测嵌入了人类潜在战略和策略动态的预测中。接下来，Rili利用这些预测来选择影响自适应人类对反复互动的有利，高奖励行为的行为。我们证明，鉴于RILI在基础分布中取样的用户的测量表现 - 我们可以从同一分布中取样的新人类中构成Rili的预期表现。我们的模拟实验将RILI与最先进的表示和强化学习基准进行了比较，并表明Rili可以更好地学习与不完美，嘈杂和时变的代理协调。最后，我们进行了两项用户研究，其中RILI在标签游戏和塔式建设任务中与实际人类共同适应。在此处查看我们的用户研究的视频：https：//youtu.be/wygo5amdxbq

When robots interact with humans in homes, roads, or factories the human's behavior often changes in response to the robot. Non-stationary humans are challenging for robot learners: actions the robot has learned to coordinate with the original human may fail after the human adapts to the robot. In this paper we introduce an algorithmic formalism that enables robots (i.e., ego agents) to co-adapt alongside dynamic humans (i.e., other agents) using only the robot's low-level states, actions, and rewards. A core challenge is that humans not only react to the robot's behavior, but the way in which humans react inevitably changes both over time and between users. To deal with this challenge, our insight is that -- instead of building an exact model of the human -- robots can learn and reason over high-level representations of the human's policy and policy dynamics. Applying this insight we develop RILI: Robustly Influencing Latent Intent. RILI first embeds low-level robot observations into predictions of the human's latent strategy and strategy dynamics. Next, RILI harnesses these predictions to select actions that influence the adaptive human towards advantageous, high reward behaviors over repeated interactions. We demonstrate that -- given RILI's measured performance with users sampled from an underlying distribution -- we can probabilistically bound RILI's expected performance across new humans sampled from the same distribution. Our simulated experiments compare RILI to state-of-the-art representation and reinforcement learning baselines, and show that RILI better learns to coordinate with imperfect, noisy, and time-varying agents. Finally, we conduct two user studies where RILI co-adapts alongside actual humans in a game of tag and a tower-building task. See videos of our user studies here: https://youtu.be/WYGO5amDXbQ

下载PDF全文

下载文献需遵守相关版权规定

论文标题