需求方调度基于多代理深度演员批评的智能网格

论文标题

需求方调度基于多代理深度演员批评的智能网格

Demand-Side Scheduling Based on Multi-Agent Deep Actor-Critic Learning for Smart Grids

论文作者

Lee, Joash, Wang, Wenbo, Niyato, Dusit

论文摘要

我们考虑了需求侧能源管理的问题，每个家庭都配备了能够在线安排家用电器的智能电表。目标是最大程度地减少实时定价计划下的整体成本。尽管以前的作品引入了集中式方法，在该方法中，调度算法具有完全可观察的性能，但我们提出了将智能网格环境作为马尔可夫游戏的表述。每个家庭都是具有部分可观察性的去中心化代理，可以在现实环境中进行可扩展性和隐私保护。电网操作员产生的价格信号随能量需求而变化。我们提出了从代理商的角度来解决部分观察力和环境的局部可观察性和感知到的非平稳性的扩展。该算法学习了一个集中的批评家，该批评者协调分散的代理商的培训。因此，我们的方法使用集中学习，但分散执行。仿真结果表明，我们的在线深入强化学习方法可以纯粹基于瞬时观察和价格信号来降低所有消耗的总能量的峰值与平均值的比率和所有家庭的电力成本。

We consider the problem of demand-side energy management, where each household is equipped with a smart meter that is able to schedule home appliances online. The goal is to minimize the overall cost under a real-time pricing scheme. While previous works have introduced centralized approaches in which the scheduling algorithm has full observability, we propose the formulation of a smart grid environment as a Markov game. Each household is a decentralized agent with partial observability, which allows scalability and privacy-preservation in a realistic setting. The grid operator produces a price signal that varies with the energy demand. We propose an extension to a multi-agent, deep actor-critic algorithm to address partial observability and the perceived non-stationarity of the environment from the agent's viewpoint. This algorithm learns a centralized critic that coordinates training of decentralized agents. Our approach thus uses centralized learning but decentralized execution. Simulation results show that our online deep reinforcement learning method can reduce both the peak-to-average ratio of total energy consumed and the cost of electricity for all households based purely on instantaneous observations and a price signal.

下载PDF全文

下载文献需遵守相关版权规定

论文标题