语言驱动图像编辑的基准和基线

论文标题

语言驱动图像编辑的基准和基线

A Benchmark and Baseline for Language-Driven Image Editing

论文作者

Shi, Jing, Xu, Ning, Bui, Trung, Dernoncourt, Franck, Wen, Zheng, Xu, Chenliang

论文摘要

语言驱动的图像编辑可以大大节省费力的图像编辑工作，并对摄影新手友好。但是，大多数类似的工作只能处理特定的图像域，或者只能进行全球修饰。为了解决这项新任务，我们首先提出一个新的语言驱动图像编辑数据集，该数据集支持本地和全局编辑，并通过编辑操作和掩码注释。此外，我们还提出了一种基线方法，该方法充分利用注释来解决此问题。我们的新方法将每个编辑操作视为子模块，并可以自动预测操作参数。不仅在具有挑战性的用户数据上表现良好，而且这种方法也可以高度解释。我们相信我们的工作，包括基准和基线，都会将图像编辑区域推向更一般和自由形式的水平。

Language-driven image editing can significantly save the laborious image editing work and be friendly to the photography novice. However, most similar work can only deal with a specific image domain or can only do global retouching. To solve this new task, we first present a new language-driven image editing dataset that supports both local and global editing with editing operation and mask annotations. Besides, we also propose a baseline method that fully utilizes the annotation to solve this problem. Our new method treats each editing operation as a sub-module and can automatically predict operation parameters. Not only performing well on challenging user data, but such an approach is also highly interpretable. We believe our work, including both the benchmark and the baseline, will advance the image editing area towards a more general and free-form level.

下载PDF全文

下载文献需遵守相关版权规定

论文标题