StyleBabel：艺术风格标记和字幕

论文标题

StyleBabel：艺术风格标记和字幕

StyleBabel: Artistic Style Tagging and Captioning

论文作者

Ruta, Dan, Gilbert, Andrew, Aggarwal, Pranav, Marri, Naveen, Kale, Ajinkya, Briggs, Jo, Speed, Chris, Jin, Hailin, Faieta, Baldo, Filipkowski, Alex, Lin, Zhe, Collomosse, John

论文摘要

我们展示了StyleBabel，这是一个独特的开放访问数据集和自然语言标题和自由形式标签，描述了135k数字艺术品的艺术风格，该数据通过专家在专家艺术和设计学校学习的专家的新颖参与方法收集。 StyleBabel是通过迭代方法收集的，受“接地理论”的启发：一种定性方法，可以注释，同时在共享一种共同的语言中以获得精细的艺术风格属性描述。我们展示了styleBabel的几项下游任务，以适应了最近的Aladin架构，以换取精细粒度样式相似性，以训练：1）自由形式的标签生成； 2）艺术风格的自然语言描述； 3）样式的细粒文本搜索。为此，我们扩展了Aladin，随着视觉变压器（VIT）和跨模式表示学习的最新进展，以细粒度的样式检索实现了最先准确的状态。

We present StyleBabel, a unique open access dataset of natural language captions and free-form tags describing the artistic style of over 135K digital artworks, collected via a novel participatory method from experts studying at specialist art and design schools. StyleBabel was collected via an iterative method, inspired by `Grounded Theory': a qualitative approach that enables annotation while co-evolving a shared language for fine-grained artistic style attribute description. We demonstrate several downstream tasks for StyleBabel, adapting the recent ALADIN architecture for fine-grained style similarity, to train cross-modal embeddings for: 1) free-form tag generation; 2) natural language description of artistic style; 3) fine-grained text search of style. To do so, we extend ALADIN with recent advances in Visual Transformer (ViT) and cross-modal representation learning, achieving a state of the art accuracy in fine-grained style retrieval.

下载PDF全文

下载文献需遵守相关版权规定

论文标题