论文标题
超级宣传:利用独立于模型的上下文数据来减少视觉常识任务中所需的数据注释
Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce Data Annotation Required in Visual Commonsense Tasks
论文作者
论文摘要
预先训练的语言模型在使用文化学习的几个学习方案中表现出了出色的结果。尽管令人印象深刻,但语言模型的规模可能会使它们在设备应用程序(例如传感器或智能手机)中可用。使用较小的语言模型,需要特定于任务的数据注释来为特定目的微调语言模型。但是,对于小型研究小组,初创企业甚至公司,数据注释可能会带来巨大的财务和时间负担。在本文中,我们分析了不同的基于迅速的微调技术,以改善语言和多模式因果变压器模型的结果。为了评估我们的结果,我们使用一个数据集,该数据集及时使用视觉常识性推理。我们的结果表明,通过仅使用35%-40%的微调训练数据集,可以通过简单的模型 - 不合稳定及时的微调进行微调。拟议的方法可带来大量的时间和财务节省。正如提出的方法是最小的架构假设的那样,其他研究人员可以在其变压器模型中使用最少适应性的结果。我们计划自由发布源代码,以使社区更容易使用和为我们的工作做出贡献。
Pre-trained language models have shown excellent results in few-shot learning scenarios using in-context learning. Although it is impressive, the size of language models can be prohibitive to make them usable in on-device applications, such as sensors or smartphones. With smaller language models, task-specific data annotation is needed to fine-tune the language model for a specific purpose. However, data annotation can have a substantial financial and time burden for small research groups, startups, and even companies. In this paper, we analyze different prompt-based fine-tuning techniques to improve results on both language and multimodal causal transformer models. To evaluate our results, we use a dataset focusing on visual commonsense reasoning in time. Our results show that by simple model-agnostic prompt-based fine-tuning, comparable results can be reached by only using 35%-40% of the fine-tuning training dataset. The proposed approaches result in significant time and financial savings. As the proposed methods make minimal architectural assumptions, other researchers can use the results in their transformer models with minimal adaptations. We plan to release the source code freely to make it easier for the community to use and contribute to our work.