论文标题
使用强化学习的多个EV充电站的需求响应协调的优化成本功能
Optimized cost function for demand response coordination of multiple EV charging stations using reinforcement learning
论文作者
论文摘要
电动汽车(EV)充电站代表了具有明显柔韧性的实质性负载。对需求响应(DR)算法的灵活性的开发对于管理和平衡电网的需求和供应越来越重要。基于增强学习(RL)的无模型DR是平衡此类EV充电负载的有吸引力的方法。我们基于对RL的先前研究,基于马尔可夫决策过程(MDP),以同时协调多个充电站。但是,我们注意到,以前研究中采用的计算昂贵成本功能会导致较长的培训时间,这限制了该方法的可行性和实用性。因此,我们提出了一种改进的成本功能,从本质上讲,迫使学习的控制政策始终满足没有任何灵活性的任何充电需求。我们严格地将新提出的批处理rl拟合的Q泰式实现与使用现实世界数据进行比较。具体而言,对于负载变平的情况,我们根据(i)学习基于RL的充电策略的处理时间以及(ii)政策决策的总体绩效在满足未见测试数据的目标负载方面的总体绩效。分析了不同的训练期和不同训练样本量的性能。除了两种RL政策绩效结果外,我们还提供了(i)最佳全知的策略的性能界限,以及(ii)随着时间的推移,简单的启发式传播单独的EV充电
Electric vehicle (EV) charging stations represent a substantial load with significant flexibility. The exploitation of that flexibility in demand response (DR) algorithms becomes increasingly important to manage and balance demand and supply in power grids. Model-free DR based on reinforcement learning (RL) is an attractive approach to balance such EV charging load. We build on previous research on RL, based on a Markov decision process (MDP) to simultaneously coordinate multiple charging stations. However, we note that the computationally expensive cost function adopted in the previous research leads to large training times, which limits the feasibility and practicality of the approach. We, therefore, propose an improved cost function that essentially forces the learned control policy to always fulfill any charging demand that does not offer any flexibility. We rigorously compare the newly proposed batch RL fitted Q-iteration implementation with the original (costly) one, using real-world data. Specifically, for the case of load flattening, we compare the two approaches in terms of (i) the processing time to learn the RL-based charging policy, as well as (ii) the overall performance of the policy decisions in terms of meeting the target load for unseen test data. The performance is analyzed for different training periods and varying training sample sizes. In addition to both RL policies performance results, we provide performance bounds in terms of both (i) an optimal all-knowing strategy, and (ii) a simple heuristic spreading individual EV charging uniformly over time