学习通过时间窗口解决多个TSP，并通过深度加强学习拒绝

论文标题

学习通过时间窗口解决多个TSP，并通过深度加强学习拒绝

Learning to Solve Multiple-TSP with Time Window and Rejections via Deep Reinforcement Learning

论文作者

Zhang, Rongkai, Zhang, Cong, Cao, Zhiguang, Song, Wen, Tan, Puay Siew, Zhang, Jie, Wen, Bihan, Dauwels, Justin

论文摘要

我们提出了一个基于深入强化学习的经理工作框架，以解决旅行推销员问题（TSP），\ ie〜多车辆TSP的艰难而又非平凡的变体，其中有时间窗口和拒绝（MTSPTWR）（MTSPTWR），在该期限之前无法在截止日期之前服务的客户将受到拒绝。特别是，在拟议的框架中，经理代理人通过基于图形同构网络（GIN）的策略网络将客户分配给每辆车，从而将MTSPTWR分为子路由任务。工人代理商学会了通过根据每辆车的旅行长度和拒绝率来最大程度地降低成本来解决子路由任务，然后将最大的最大值送回经理代理，以学习更好的任务。实验结果表明，所提出的框架在更高的解决方案质量和较短的计算时间方面优于强基础。更重要的是，训练有素的代理商还取得了竞争性能，以解决看不见的较大实例。

We propose a manager-worker framework based on deep reinforcement learning to tackle a hard yet nontrivial variant of Travelling Salesman Problem (TSP), \ie~multiple-vehicle TSP with time window and rejections (mTSPTWR), where customers who cannot be served before the deadline are subject to rejections. Particularly, in the proposed framework, a manager agent learns to divide mTSPTWR into sub-routing tasks by assigning customers to each vehicle via a Graph Isomorphism Network (GIN) based policy network. A worker agent learns to solve sub-routing tasks by minimizing the cost in terms of both tour length and rejection rate for each vehicle, the maximum of which is then fed back to the manager agent to learn better assignments. Experimental results demonstrate that the proposed framework outperforms strong baselines in terms of higher solution quality and shorter computation time. More importantly, the trained agents also achieve competitive performance for solving unseen larger instances.

下载PDF全文

下载文献需遵守相关版权规定

论文标题