SAT-MARL：多代理强化学习中的规格意识培训

论文标题

SAT-MARL：多代理强化学习中的规格意识培训

SAT-MARL: Specification Aware Training in Multi-Agent Reinforcement Learning

论文作者

Ritz, Fabian, Phan, Thomy, Müller, Robert, Gabor, Thomas, Sedlmeier, Andreas, Zeller, Marc, Wieghardt, Jan, Schmid, Reiner, Sauer, Horst, Klein, Cornel, Linnhoff-Popien, Claudia

论文摘要

加强学习的一个特征是在解决问题时能够制定不可预见的策略的能力。尽管这种策略有时会产生卓越的性能，但它们也可能导致不希望甚至危险的行为。在工业场景中，系统的行为也需要可预测并在定义的范围内。为了使代理商能够学习（如何）与给定规范保持一致，本文建议将功能和非功能性要求明确地转移到形状的奖励中。实验是在Smart Factory上进行的，Smart Factory是一个多代理环境，建模工业大小的生产设施，最多有八个代理商和不同的多名强化学习算法。结果表明，拟议方法可以实现遵守功能和非功能约束。

A characteristic of reinforcement learning is the ability to develop unforeseen strategies when solving problems. While such strategies sometimes yield superior performance, they may also result in undesired or even dangerous behavior. In industrial scenarios, a system's behavior also needs to be predictable and lie within defined ranges. To enable the agents to learn (how) to align with a given specification, this paper proposes to explicitly transfer functional and non-functional requirements into shaped rewards. Experiments are carried out on the smart factory, a multi-agent environment modeling an industrial lot-size-one production facility, with up to eight agents and different multi-agent reinforcement learning algorithms. Results indicate that compliance with functional and non-functional constraints can be achieved by the proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题