论文标题
动态奖励意识学习:上下文拍卖中的稳健定价
Dynamic Incentive-aware Learning: Robust Pricing in Contextual Auctions
论文作者
论文摘要
通过在广告交易市场的定价中,我们考虑了在反复上下文第二价格拍卖中对储备价格进行强劲学习对战略买家的问题。买方对项目的估值取决于描述项目的上下文。但是,卖方不知道上下文与买方估值之间的关系,即买家的偏好。卖方的目标是设计一项学习政策,通过观察过去的销售数据来设定储备价格,她的目标是最大程度地减少对收入的遗憾,在这种情况下,遗憾是针对千里眼的政策,该政策知道买家的异质偏好。鉴于卖方的目标,公用事业最大化的买家有动力不真实地竞标,以操纵卖方的学习政策。我们提出了对这种战略行为强大的学习政策。这些政策使用拍卖的结果,而不是提交的投标,以估算偏好,同时控制每次拍卖结果对未来储备价格的长期影响。当卖方知道市场噪声分布时,我们提出了一项称为上下文稳健定价(CORP)的政策,该策略实现了$ O(d \ log(td)\ log(t))$的t-period遗憾,其中$ d $是{}}}上下文信息的维度。当卖方不知道市场噪声时,我们提出了两项政策,其遗憾的是$ t $的统一性。
Motivated by pricing in ad exchange markets, we consider the problem of robust learning of reserve prices against strategic buyers in repeated contextual second-price auctions. Buyers' valuations for an item depend on the context that describes the item. However, the seller is not aware of the relationship between the context and buyers' valuations, i.e., buyers' preferences. The seller's goal is to design a learning policy to set reserve prices via observing the past sales data, and her objective is to minimize her regret for revenue, where the regret is computed against a clairvoyant policy that knows buyers' heterogeneous preferences. Given the seller's goal, utility-maximizing buyers have the incentive to bid untruthfully in order to manipulate the seller's learning policy. We propose learning policies that are robust to such strategic behavior. These policies use the outcomes of the auctions, rather than the submitted bids, to estimate the preferences while controlling the long-term effect of the outcome of each auction on the future reserve prices. When the market noise distribution is known to the seller, we propose a policy called Contextual Robust Pricing (CORP) that achieves a T-period regret of $O(d\log(Td) \log (T))$, where $d$ is the dimension of {the} contextual information. When the market noise distribution is unknown to the seller, we propose two policies whose regrets are sublinear in $T$.