论文标题
一个实用的分布式ADMM求解器,用于十亿级的通用作业问题
A Practical Distributed ADMM Solver for Billion-Scale Generalized Assignment Problems
论文作者
论文摘要
将项目分配给所有者是在各种现实世界应用程序中发现的一个常见问题,例如,营销活动中的受众渠道匹配,贷款管理中的借款人和借款人匹配以及在电子商务中的购物者搭配。给定目标和多个约束,可以将分配问题提出为受约束的优化问题。此类分配问题通常是NP-HARD,因此,当项目数量或所有者数量很大时,解决精确解决方案就会变得具有挑战性。在本文中,我们有兴趣解决数亿个项目的约束分配问题。因此,只有数十个所有者,决策变量的数量在十亿级。这个量表通常在互联网行业中可以看到,该行业为大量用户做出决策。我们放宽可能的整数约束,并制定一个涵盖常见的分配问题的一般优化问题。它的目标函数是凸。它的约束是线性的,要么是凸的,并且可以通过项目分离。我们研究以在Bregman交替方向方法(badmm)框架中解决广义分配问题,在此我们利用Bregman的差异将增强的Lagrangian转变为可分离的形式,并在并行解决许多子问题。因此,可以使用MapReduce式分布式计算框架实现整个解决方案。我们介绍了合成数据集和现实数据集的实验结果,以验证其准确性和可扩展性。
Assigning items to owners is a common problem found in various real-world applications, for example, audience-channel matching in marketing campaigns, borrower-lender matching in loan management, and shopper-merchant matching in e-commerce. Given an objective and multiple constraints, an assignment problem can be formulated as a constrained optimization problem. Such assignment problems are usually NP-hard, so when the number of items or the number of owners is large, solving for exact solutions becomes challenging. In this paper, we are interested in solving constrained assignment problems with hundreds of millions of items. Thus, with just tens of owners, the number of decision variables is at billion-scale. This scale is usually seen in the internet industry, which makes decisions for large groups of users. We relax the possible integer constraint, and formulate a general optimization problem that covers commonly seen assignment problems. Its objective function is convex. Its constraints are either linear, or convex and separable by items. We study to solve our generalized assignment problems in the Bregman Alternating Direction Method of Multipliers (BADMM) framework where we exploit Bregman divergence to transform the Augmented Lagrangian into a separable form, and solve many subproblems in parallel. The entire solution can thus be implemented using a MapReduce-style distributed computation framework. We present experiment results on both synthetic and real-world datasets to verify its accuracy and scalability.