多单元双拍卖：使用DDPG在智能网格中使用DDPG的均衡分析和招标策略

论文标题

多单元双拍卖：使用DDPG在智能网格中使用DDPG的均衡分析和招标策略

Multi-unit Double Auctions: Equilibrium Analysis and Bidding Strategy using DDPG in Smart-grids

论文作者

Chandlekar, Sanjay, Subramanian, Easwar, Bhat, Sanjay, Paruchuri, Praveen, Gujar, Sujit

论文摘要

周期性的双重拍卖（PDA）在许多领域（例如在电子商务，日内股票市场和智能网格的日前的能源市场）中都有应用。尽管使用PDA完成的交易价值数万亿美元，但在此类拍卖中找到可靠的招标策略仍然是一个挑战，因为它需要考虑未来的拍卖。 PDA的参与购买者必须通过计划当前和将来的拍卖来设计其招标策略。提出的许多基于均衡的招标策略很复杂，可实时使用。在当前的博览会中，我们为参加PDA的买家提出了一种基于规模的招标策略。我们首先提供了单订单单销售商多单元单杆K双拍auctions的均衡分析。具体而言，我们分析了卖方和买方交易两个相同数量单位的情况，在双重拍卖中，买方和卖方都采用了一种简单的，基于规模的招标策略。随着参与者数量的增加，平衡分析变得棘手。为了在更复杂的环境中有用，例如智能网格中的批发市场，我们将均衡竞标策略建模为学习问题。我们为PDA中的参与代理提供了基于深层的确定性政策梯度（DDPG）的学习策略DDPGBBS，以在任何拍卖实例中提出一项行动。 DDPGBB的经验遵循所获得的理论均衡，当买卖双方增加时，很容易扩展。我们将电力贸易代理竞赛（Powertac）批发市场PDA作为评估我们的新型招标策略的测试台。我们针对PowerTac批发市场PDA的多个基线和最先进的竞标策略基于DDPG的策略，并证明了DDPGBBS对几种基准测试策略的效力。

Periodic double auctions (PDA) have applications in many areas such as in e-commerce, intra-day equity markets, and day-ahead energy markets in smart-grids. While the trades accomplished using PDAs are worth trillions of dollars, finding a reliable bidding strategy in such auctions is still a challenge as it requires the consideration of future auctions. A participating buyer in a PDA has to design its bidding strategy by planning for current and future auctions. Many equilibrium-based bidding strategies proposed are complex to use in real-time. In the current exposition, we propose a scale-based bidding strategy for buyers participating in PDA. We first present an equilibrium analysis for single-buyer single-seller multi-unit single-shot k-Double auctions. Specifically, we analyze the situation when a seller and a buyer trade two identical units of quantity in a double auction where both the buyer and the seller deploy a simple, scale-based bidding strategy. The equilibrium analysis becomes intractable as the number of participants increases. To be useful in more complex settings such as wholesale markets in smart-grids, we model equilibrium bidding strategy as a learning problem. We develop a deep deterministic policy gradient (DDPG) based learning strategy, DDPGBBS, for a participating agent in PDAs to suggest an action at any auction instance. DDPGBBS, which empirically follows the obtained theoretical equilibrium, is easily extendable when the number of buyers/sellers increases. We take Power Trading Agent Competition's (PowerTAC) wholesale market PDA as a testbed to evaluate our novel bidding strategy. We benchmark our DDPG based strategy against several baselines and state-of-the-art bidding strategies of the PowerTAC wholesale market PDA and demonstrate the efficacy of DDPGBBS against several benchmarked strategies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题