语言控制扩散：有效地通过空间，时间和任务扩展

论文标题

语言控制扩散：有效地通过空间，时间和任务扩展

Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks

论文作者

Zhang, Edwin, Lu, Yujie, Huang, Shinda, Wang, William, Zhang, Amy

论文摘要

在几个轴上很难训练通才的代理商，这要求我们处理高维输入（空间），远距离（时间）和对新任务的概括。架构的最新进展允许沿其中一两个轴的一个或两个改进缩放率，但仍在计算上使用的使用效率仍然很高。在本文中，我们建议通过利用\ textbf {l}痛苦来解决所有三个轴。我们有效，有效地扩展扩散模型，以扩展时间，状态和任务维度，以解决以自然语言指示为条件的长度地平线控制问题，这是朝着通才的代理人迈出的一步。将LCD与加尔文语言机器人基准上的其他最先进的模型进行比较，发现LCD在多任务成功率中的表现优于其他SOTA方法，同时将推断速度提高了3.3x〜15x的推理速度。我们表明，LCD可以成功利用扩散模型的独特强度来产生连贯的远距离计划，同时解决它们在产生低级细节和控制方面的弱点。

Training generalist agents is difficult across several axes, requiring us to deal with high-dimensional inputs (space), long horizons (time), and generalization to novel tasks. Recent advances with architectures have allowed for improved scaling along one or two of these axes, but are still computationally prohibitive to use. In this paper, we propose to address all three axes by leveraging \textbf{L}anguage to \textbf{C}ontrol \textbf{D}iffusion models as a hierarchical planner conditioned on language (LCD). We effectively and efficiently scale diffusion models for planning in extended temporal, state, and task dimensions to tackle long horizon control problems conditioned on natural language instructions, as a step towards generalist agents. Comparing LCD with other state-of-the-art models on the CALVIN language robotics benchmark finds that LCD outperforms other SOTA methods in multi-task success rates, whilst improving inference speed over other comparable diffusion models by 3.3x~15x. We show that LCD can successfully leverage the unique strength of diffusion models to produce coherent long range plans while addressing their weakness in generating low-level details and control.

下载PDF全文

下载文献需遵守相关版权规定

论文标题