通过基于距离的激励/惩罚（DIP）更新的强化学习，用于高度受限的工业控制系统

论文标题

通过基于距离的激励/惩罚（DIP）更新的强化学习，用于高度受限的工业控制系统

Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems

论文作者

Park, Hyungjun, Min, Daiki, Ryu, Jong-hyun, Choi, Dong Gu

论文摘要

典型的强化学习（RL）方法显示，对现实世界中的工业控制问题的适用性有限，因为工业系统涉及各种限制，并且同时需要连续和离散的控制。为了克服这些挑战，我们设计了一种新型的RL算法，该算法使代理能够处理高度约束的动作空间。该算法具有两个主要功能。首先，我们在基于距离的激励/惩罚更新技术中设计了两个基于距离的Q值更新方案，激励更新和罚款更新，以使代理商决定可行区域中的离散和连续操作，并更新这些操作类型的价值。其次，我们提出了一种将罚款成本定义为影子价格加权罚款的方法。与以前的方法相比，这种方法具有两个优势，以有效地诱导代理不选择不可行的作用。我们将算法应用于工业控制问题，微电网系统操作，实验结果证明了其优越性。

Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First, we devise two distance-based Q-value update schemes, incentive update and penalty update, in a distance-based incentive/penalty update technique to enable the agent to decide discrete and continuous actions in the feasible region and to update the value of these types of actions. Second, we propose a method for defining the penalty cost as a shadow price-weighted penalty. This approach affords two advantages compared to previous methods to efficiently induce the agent to not select an infeasible action. We apply our algorithm to an industrial control problem, microgrid system operation, and the experimental results demonstrate its superiority.

下载PDF全文

下载文献需遵守相关版权规定

论文标题