Acort：用于参数有效图像字幕的紧凑对象关系变压器

论文标题

Acort：用于参数有效图像字幕的紧凑对象关系变压器

ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning

论文作者

Tan, Jia Huei, Tan, Ying Hua, Chan, Chee Seng, Chuah, Joon Huang

论文摘要

将基于变压器的架构应用于图像字幕的最新研究导致了最新的图像字幕性能，并利用了变形金刚在自然语言任务上的成功。不幸的是，尽管这些模型效果很好，但一个主要缺陷是它们的大型模型尺寸。为此，我们提供了三种用于图像字幕变压器的参数减少方法：radix编码，跨层参数共享和注意参数共享。通过结合这些方法，我们提出的ACORT模型的参数比基线模型少3.7倍至21.6倍，而不会损害测试性能。 MS-Coco数据集上的结果表明，我们的ACORT模型在基准和SOTA方法上具有竞争力，苹果酒得分> = 126。最后，我们提出了定性的结果和消融研究，以证明所提出的变化的功效。代码和预培训模型可在https://github.com/jiahuei/sparse-image-captioning上公开获得。

Recent research that applies Transformer-based architectures to image captioning has resulted in state-of-the-art image captioning performance, capitalising on the success of Transformers on natural language tasks. Unfortunately, though these models work well, one major flaw is their large model sizes. To this end, we present three parameter reduction methods for image captioning Transformers: Radix Encoding, cross-layer parameter sharing, and attention parameter sharing. By combining these methods, our proposed ACORT models have 3.7x to 21.6x fewer parameters than the baseline model without compromising test performance. Results on the MS-COCO dataset demonstrate that our ACORT models are competitive against baselines and SOTA approaches, with CIDEr score >=126. Finally, we present qualitative results and ablation studies to demonstrate the efficacy of the proposed changes further. Code and pre-trained models are publicly available at https://github.com/jiahuei/sparse-image-captioning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题