在线决策变压器

论文标题

在线决策变压器

Online Decision Transformer

论文作者

Zheng, Qinqing, Zhang, Amy, Grover, Aditya

论文摘要

最近的工作表明，离线增强学习（RL）可以作为序列建模问题（Chen等，2021; Janner等，2021）配制，并通过类似于大规模语言建模的方法解决。但是，RL的任何实际实例化也涉及一个在线组件，在线组件中，通过与环境的任务规定相互作用对被动离线数据集预估计的策略进行了审核。我们提出了在线决策变压器（ODT），这是一种基于序列建模的RL算法，该算法将离线预处理与在线框架中的在线填充融合在一起。我们的框架将序列级熵正规仪与自回归建模目标结合使用，用于样品效率探索和鉴定。从经验上讲，我们表明ODT在D4RL基准上的绝对性能上与最先进的竞争力竞争，但在填充程序中显示出更大的收益。

Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021; Janner et al., 2021) and solved via approaches similar to large-scale language modeling. However, any practical instantiation of RL also involves an online component, where policies pretrained on passive offline datasets are finetuned via taskspecific interactions with the environment. We propose Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning in a unified framework. Our framework uses sequence-level entropy regularizers in conjunction with autoregressive modeling objectives for sample-efficient exploration and finetuning. Empirically, we show that ODT is competitive with the state-of-the-art in absolute performance on the D4RL benchmark but shows much more significant gains during the finetuning procedure.

下载PDF全文

下载文献需遵守相关版权规定

论文标题