论文标题

双重政策蒸馏

Dual Policy Distillation

论文作者

Lai, Kwei-Herng, Zha, Daochen, Li, Yuening, Hu, Xia

论文摘要

将教师政策转移到学生政策的政策蒸馏在挑战深度强化学习任务方面取得了巨大成功。这个教师学生框架需要一个训练有素的教师模型,该模型在计算上昂贵。此外,如果教师模型不是最佳的,那么学生模型的表现可能会受到教师模型的限制。鉴于协作学习,我们研究了从学生模型的不同角度涉及共同智力努力的可行性。在这项工作中,我们介绍了双重政策蒸馏(DPD),这是一个学生学生框架,其中两个学习者在相同的环境上运作,以探索环境的不同观点,并相互提取知识以增强他们的学习。开发这一双重学习框架的主要挑战是从同伴学习者那里确定基于当代学习的强化学习算法的有益知识,因为目前尚不清楚从不完美和嘈杂的同伴学习者中提取的知识是否会有所帮助。为了应对挑战,我们从理论上证明,从同伴学习者中提取知识将导致政策改善,并基于理论结果提出不利的蒸馏策略。对几个连续控制任务进行的进行实验表明,所提出的框架通过基于学习的代理和功能近似实现了卓越的性能,而无需使用昂贵的教师模型。

Policy distillation, which transfers a teacher policy to a student policy has achieved great success in challenging tasks of deep reinforcement learning. This teacher-student framework requires a well-trained teacher model which is computationally expensive. Moreover, the performance of the student model could be limited by the teacher model if the teacher model is not optimal. In the light of collaborative learning, we study the feasibility of involving joint intellectual efforts from diverse perspectives of student models. In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment and extract knowledge from each other to enhance their learning. The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms, since it is unclear whether the knowledge distilled from an imperfect and noisy peer learner would be helpful. To address the challenge, we theoretically justify that distilling knowledge from a peer learner will lead to policy improvement and propose a disadvantageous distillation strategy based on the theoretical results. The conducted experiments on several continuous control tasks show that the proposed framework achieves superior performance with a learning-based agent and function approximation without the use of expensive teacher models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源