双向对比度分裂学习用于视觉问题答案

论文标题

双向对比度分裂学习用于视觉问题答案

Bidirectional Contrastive Split Learning for Visual Question Answering

论文作者

Sun, Yuwei, Ochiai, Hideya

论文摘要

基于多模式数据的视觉问题回答（VQA）有助于现实生活中的应用程序，例如家庭机器人和医疗诊断。一个重大的挑战是为各种客户模型设计一个强大的分散学习框架，在这些客户模型中，由于保密性问题，集中式数据收集。这项工作旨在通过将多模式模型分解为表示模块以及对比度模块以及利用模块间梯度共享和偏置间的重量共享来解决保留隐私的VQA。为此，我们提出了双向对比分裂学习（BICSL），以培训有关分散客户的整个数据分布的全球多模式模型。我们采用了对比损失，使分散模块的学习能力更有效。全面的实验是基于五个SOTA VQA模型在VQA-V2数据集上进行的，证明了该方法的有效性。此外，我们检查了BICSL的鲁棒性，以防止对VQA进行双键后门攻击。因此，与集中学习方法相比，BICSL对多模式对抗性攻击显示出更好的鲁棒性，该方法为分散的多模式学习提供了有希望的方法。

Visual Question Answering (VQA) based on multi-modal data facilitates real-life applications such as home robots and medical diagnoses. One significant challenge is to devise a robust decentralized learning framework for various client models where centralized data collection is refrained due to confidentiality concerns. This work aims to tackle privacy-preserving VQA by decoupling a multi-modal model into representation modules and a contrastive module and leveraging inter-module gradients sharing and inter-client weight sharing. To this end, we propose Bidirectional Contrastive Split Learning (BiCSL) to train a global multi-modal model on the entire data distribution of decentralized clients. We employ the contrastive loss that enables a more efficient self-supervised learning of decentralized modules. Comprehensive experiments are conducted on the VQA-v2 dataset based on five SOTA VQA models, demonstrating the effectiveness of the proposed method. Furthermore, we inspect BiCSL's robustness against a dual-key backdoor attack on VQA. Consequently, BiCSL shows much better robustness to the multi-modal adversarial attack compared to the centralized learning method, which provides a promising approach to decentralized multi-modal learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题