改善以对象世界模型中的生成想象力

论文标题

改善以对象世界模型中的生成想象力

Improving Generative Imagination in Object-Centric World Models

论文作者

Lin, Zhixuan, Wu, Yi-Fu, Peri, Skand, Fu, Bofeng, Jiang, Jindong, Ahn, Sungjin

论文摘要

以对象为中心的生成世界模型的最新进展引发了一些问题。首先，虽然许多最近的成就对于制作一般且多才多艺的世界模型是必不可少的，但尚不清楚如何将这些成分集成到统一的框架中。其次，尽管使用了生成目标，但主要研究了对象检测和跟踪的能力，这使时间想象力的关键能力在很大程度上受到质疑。第三，缺少一些更忠实的时间想象力，例如多模式的不确定性和情境意识。在本文中，我们介绍了生成结构化的世界模型（G-SWM）。 G-SWM不仅通过在原则性框架中统一先前模型的关键特性，还可以实现两种至关重要的新能力，多模式的不确定性和情境意识来实现多功能世界建模。与先前的模型相比，我们对时间生成能力的彻底研究表明，G-SWM在所有实验设置中都具有最佳或可比性的性能，包括一些以前未曾测试过的复杂设置。

The remarkable recent advances in object-centric generative world models raise a few questions. First, while many of the recent achievements are indispensable for making a general and versatile world model, it is quite unclear how these ingredients can be integrated into a unified framework. Second, despite using generative objectives, abilities for object detection and tracking are mainly investigated, leaving the crucial ability of temporal imagination largely under question. Third, a few key abilities for more faithful temporal imagination such as multimodal uncertainty and situation-awareness are missing. In this paper, we introduce Generative Structured World Models (G-SWM). The G-SWM achieves the versatile world modeling not only by unifying the key properties of previous models in a principled framework but also by achieving two crucial new abilities, multimodal uncertainty and situation-awareness. Our thorough investigation on the temporal generation ability in comparison to the previous models demonstrates that G-SWM achieves the versatility with the best or comparable performance for all experiment settings including a few complex settings that have not been tested before.

下载PDF全文

下载文献需遵守相关版权规定

论文标题