用上下文多军匪徒积极推断自主决策

论文标题

用上下文多军匪徒积极推断自主决策

Active Inference for Autonomous Decision-Making with Contextual Multi-Armed Bandits

论文作者

Wakayama, Shohei, Ahmed, Nisar

论文摘要

在不确定性下的自动机器人决策中，必须考虑剥削和探索可用选项之间的权衡。如果可以利用与选项相关的二级信息，则这些决策问题通常可以作为上下文多臂匪徒（CMAB）提出。在这项研究中，我们应用了主动推断，该推断近年来在神经科学领域进行了积极研究，作为CMAB的替代行动选择策略。与常规的行动选择策略不同，在计算与决策代理人的概率模型相关的预期自由能（EFE）时，可以严格评估每种选项的不确定性，这是从自由能原理中得出的。我们特别解决了使用分类观察可能性函数的情况，以便在分析上棘手。我们引入了基于变异和拉普拉斯近似值计算EFE的新近似方法。广泛的仿真研究结果表明，与其他策略相比，主动推断通常需要迭代少得多才能识别最佳选择并普遍实现累积累积的遗憾，因为相对较低的额外计算成本。

In autonomous robotic decision-making under uncertainty, the tradeoff between exploitation and exploration of available options must be considered. If secondary information associated with options can be utilized, such decision-making problems can often be formulated as contextual multi-armed bandits (CMABs). In this study, we apply active inference, which has been actively studied in the field of neuroscience in recent years, as an alternative action selection strategy for CMABs. Unlike conventional action selection strategies, it is possible to rigorously evaluate the uncertainty of each option when calculating the expected free energy (EFE) associated with the decision agent's probabilistic model, as derived from the free-energy principle. We specifically address the case where a categorical observation likelihood function is used, such that EFE values are analytically intractable. We introduce new approximation methods for computing the EFE based on variational and Laplace approximations. Extensive simulation study results demonstrate that, compared to other strategies, active inference generally requires far fewer iterations to identify optimal options and generally achieves superior cumulative regret, for relatively low extra computational cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题