熵登记的随机控制问题的政策迭代的收敛性

论文标题

熵登记的随机控制问题的政策迭代的收敛性

Convergence of Policy Iteration for Entropy-Regularized Stochastic Control Problems

论文作者

Huang, Yu-Jui, Wang, Zhenhua, Zhou, Zhou

论文摘要

对于一般的熵调查的随机控制问题，我们证明策略迭代算法（PIA）会收敛到最佳的松弛控制。与标准的随机控制文献相反，由于附加的熵调查项，经典的Hölder估计值无法确保PIA的收敛性。为了避免这种情况，我们通过在适当的Hölder和Sobolev空间之间来回移动来进行微妙的估计。这需要专门为政策迭代目的而设计的新Sobolev估计值和一种非平凡技术来包含熵增长。最终，我们获得了一个均匀的Hölder结合了PIA产生的价值函数序列，从而实现了所需的收敛结果。将最佳值函数的表征作为探索性汉密尔顿 - 雅各比 - 贝尔曼方程的唯一解决方案作为副产品。 PIA在最佳消耗的示例中数值实现。

For a general entropy-regularized stochastic control problem on an infinite horizon, we prove that a policy iteration algorithm (PIA) converges to an optimal relaxed control. Contrary to the standard stochastic control literature, classical Hölder estimates of value functions do not ensure the convergence of the PIA, due to the added entropy-regularizing term. To circumvent this, we carry out a delicate estimation by moving back and forth between appropriate Hölder and Sobolev spaces. This requires new Sobolev estimates designed specifically for the purpose of policy iteration and a nontrivial technique to contain the entropy growth. Ultimately, we obtain a uniform Hölder bound for the sequence of value functions generated by the PIA, thereby achieving the desired convergence result. Characterization of the optimal value function as the unique solution to an exploratory Hamilton-Jacobi-Bellman equation comes as a by-product. The PIA is numerically implemented in an example of optimal consumption.

下载PDF全文

下载文献需遵守相关版权规定

论文标题