论文标题
分散还是连接?一种优化的参数调整方法,用于信息检索
Scattered or Connected? An Optimized Parameter-efficient Tuning Approach for Information Retrieval
论文作者
论文摘要
培训和微调在信息检索(IR)方面取得了重大进展。一种典型的方法是微调下游任务上大规模预训练模型(PTM)的所有参数。随着模型大小和任务数量的大大增加,这种方法变得不可行且昂贵。最近,在自然语言处理(NLP)中提出了各种参数效率调整方法,这些调整方法仅微调少量参数,同时仍保持强劲的性能。然而,几乎没有努力探索IR的参数有效调整。 在这项工作中,我们首先对检索和重新排列阶段的现有参数有效调整方法进行了全面研究。与NLP中有希望的结果不同,我们发现在更新原始模型参数的1 \%时,这些方法在两个阶段都无法实现与完整调整的可比性性能。更重要的是,我们发现现有方法只是参数有效的,但由于训练不稳定和收敛缓慢而效率不高。为了分析基本原因,我们进行了理论分析,并表明插入的可训练模块的分离使优化变得困难。为了减轻此问题,我们建议将其他模块与\ acp {ptm}一起注入,以使原始的散射模块连接。这样,所有可训练的模块都可以形成一条途径以使损耗表面平滑,从而有助于稳定训练过程。在检索和重新排列阶段的实验表明,我们的方法的表现大大优于现有的参数效率方法,并且在完整的微调中实现了可比甚至更好的性能。
Pre-training and fine-tuning have achieved significant advances in the information retrieval (IR). A typical approach is to fine-tune all the parameters of large-scale pre-trained models (PTMs) on downstream tasks. As the model size and the number of tasks increase greatly, such approach becomes less feasible and prohibitively expensive. Recently, a variety of parameter-efficient tuning methods have been proposed in natural language processing (NLP) that only fine-tune a small number of parameters while still attaining strong performance. Yet there has been little effort to explore parameter-efficient tuning for IR. In this work, we first conduct a comprehensive study of existing parameter-efficient tuning methods at both the retrieval and re-ranking stages. Unlike the promising results in NLP, we find that these methods cannot achieve comparable performance to full fine-tuning at both stages when updating less than 1\% of the original model parameters. More importantly, we find that the existing methods are just parameter-efficient, but not learning-efficient as they suffer from unstable training and slow convergence. To analyze the underlying reason, we conduct a theoretical analysis and show that the separation of the inserted trainable modules makes the optimization difficult. To alleviate this issue, we propose to inject additional modules alongside the \acp{PTM} to make the original scattered modules connected. In this way, all the trainable modules can form a pathway to smooth the loss surface and thus help stabilize the training process. Experiments at both retrieval and re-ranking stages show that our method outperforms existing parameter-efficient methods significantly, and achieves comparable or even better performance over full fine-tuning.