通过第一人称行为克隆的有效探索有助于快速探索随机树

论文标题

通过第一人称行为克隆的有效探索有助于快速探索随机树

Efficient Exploration via First-Person Behavior Cloning Assisted Rapidly-Exploring Random Trees

论文作者

Zuo, Max, Schick, Logan, Gombolay, Matthew, Gopalan, Nakul

论文摘要

现代电脑游戏的州和动作空间非常大。为了检测这些游戏模型中的错误，人类测试人员反复玩游戏以探索游戏并发现游戏中的错误。这样的游戏玩法是详尽而耗时的。此外，由于机器人模拟器依赖于模型规范和调试的类似方法，因此在机器人社区中找到错误的问题是确保机器人行为和交互在模拟器中是一致的。以前的方法已经使用了强化学习ARXIV：2103.13798和基于搜索的方法（Chang，2019，（Chang，2021）Arxiv：1811.06962，包括迅速探索随机树（RRT），以探索游戏的状态交易空间，以寻找基于搜索和探索的方法。将人类塔的专业知识结合在一起，在详细介绍游戏中，RRT详尽地搜索了该游戏的覆盖范围。 2）如Chang等人所述，人类的示范播种了RRT。 al。我们发现CA-RRT适用于更多的游戏地图，与现有基线相比，在更少的树木扩展/迭代中探索了更多的游戏状态。在每个测试中，CA-RRT平均在与加权RRT相同数量的迭代次数中平均达到了更多状态。在我们测试的环境中，CA-RRT平均达到了与加权RRT相同数量的状态数量，平均次数少了5000多个，几乎降低了50％，并应用于更多的场景。此外，由于我们的第一人称行为克隆方法，CA-RRT在看不见的游戏地图上工作，而不仅仅是用人类展示的状态播种RRT。

Modern day computer games have extremely large state and action spaces. To detect bugs in these games' models, human testers play the games repeatedly to explore the game and find errors in the games. Such gameplay is exhaustive and time consuming. Moreover, since robotics simulators depend on similar methods of model specification and debugging, the problem of finding errors in the model is of interest to the robotics community to ensure robot behaviors and interactions are consistent in simulators. Previous methods have used reinforcement learning arXiv:2103.13798 and search based methods (Chang, 2019, (Chang, 2021) arXiv:1811.06962 including Rapidly-exploring Random Trees (RRT) to explore a game's state-action space to find bugs. However, such search and exploration based methods are not efficient at exploring the state-action space without a pre-defined heuristic. In this work we attempt to combine a human-tester's expertise in solving games, and the RRT's exhaustiveness to search a game's state space efficiently with high coverage. This paper introduces Cloning Assisted RRT (CA-RRT) to test a game through search. We compare our methods to two existing baselines: 1) a weighted-RRT as described by arXiv:1812.03125; 2) human demonstration seeded RRT as described by Chang et. al. We find CA-RRT is applicable to more game maps and explores more game states in fewer tree expansions/iterations when compared to the existing baselines. In each test, CA-RRT reached more states on average in the same number of iterations as weighted-RRT. In our tested environments, CA-RRT reached the same number of states as weighted-RRT by more than 5000 fewer iterations on average, almost a 50% reduction and applied to more scenarios than. Moreover, as a consequence of our first person behavior cloning approach, CA-RRT worked on unseen game maps than just seeding the RRT with human demonstrated states.

下载PDF全文

下载文献需遵守相关版权规定

论文标题