关于专家混合的对抗性鲁棒性

论文标题

关于专家混合的对抗性鲁棒性

On the Adversarial Robustness of Mixture of Experts

论文作者

Puigcerver, Joan, Jenatton, Rodolphe, Riquelme, Carlos, Awasthi, Pranjal, Bhojanapalli, Srinadh

论文摘要

对抗性鲁棒性是神经网络的关键理想特性。凭经验证明，它受其大小的影响，较大的网络通常更健壮。最近，Bubeck和Sellke证明了Lipschitz常数的下限，该功能适合训练数据的参数数量。这提出了一个有趣的开放问题，具有更多参数的功能，但不一定是更多的计算成本，具有更好的鲁棒性？我们研究了这个问题，以稀疏的专家模型（MOE）的稀疏混合物，这使得可以扩大模型大小的规模，以大致恒定的计算成本。从理论上讲，我们在数据的路由和数据结构的某些条件下表明，MOE的Lipschitz常数明显小于其密集的同行。当输入的最高加权专家实现足够不同的功能时，MOE的鲁棒性可能会受到影响。接下来，我们使用对抗性攻击在经验上评估MOE在Imagenet上的鲁棒性，并表明它们确实比具有相同计算成本的密集模型更强大。我们进行了关键的观察，表明MOE对专家的选择稳健性，强调了在实践中训练的模型中专家的冗余。

Adversarial robustness is a key desirable property of neural networks. It has been empirically shown to be affected by their sizes, with larger networks being typically more robust. Recently, Bubeck and Sellke proved a lower bound on the Lipschitz constant of functions that fit the training data in terms of their number of parameters. This raises an interesting open question, do -- and can -- functions with more parameters, but not necessarily more computational cost, have better robustness? We study this question for sparse Mixture of Expert models (MoEs), that make it possible to scale up the model size for a roughly constant computational cost. We theoretically show that under certain conditions on the routing and the structure of the data, MoEs can have significantly smaller Lipschitz constants than their dense counterparts. The robustness of MoEs can suffer when the highest weighted experts for an input implement sufficiently different functions. We next empirically evaluate the robustness of MoEs on ImageNet using adversarial attacks and show they are indeed more robust than dense models with the same computational cost. We make key observations showing the robustness of MoEs to the choice of experts, highlighting the redundancy of experts in models trained in practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题