UIT-VIIC：用于首次评估越南图像字幕的数据集

论文标题

UIT-VIIC：用于首次评估越南图像字幕的数据集

UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning

论文作者

Lam, Quan Hoang, Le, Quang Duy, Van Nguyen, Kiet, Nguyen, Ngan Luu-Thuy

论文摘要

图像字幕是自动生成图像标题的任务，吸引了许多计算机科学领域的研究人员的注意力，近年来是计算机视觉，自然语言处理和机器学习。本文以将数据集扩展到越南语的不同语言来有助于研究图像字幕任务。到目前为止，还没有针对越南语言的图像标题的图像字幕，因此这是开发越南图像字幕的最重要的基本步骤。在此范围内，我们首先构建了一个数据集，该数据集包含与Microsoft Coco数据集的图像的手动书面字幕有关，该数据集与与球一起玩的运动有关，我们称此数据集UIT-VIIC。 UIT-VIIC由19,250个越南字幕组成，显示3,850张图像。在此之后，我们在深神网络模型上评估了数据集，并与英语数据集和两个通过不同方法构建的越南数据集进行了比较。 UIT-VIIC出于研究目的发布在我们的实验室网站上。

Image Captioning, the task of automatic generation of image captions, has attracted attentions from researchers in many fields of computer science, being computer vision, natural language processing and machine learning in recent years. This paper contributes to research on Image Captioning task in terms of extending dataset to a different language - Vietnamese. So far, there is no existed Image Captioning dataset for Vietnamese language, so this is the foremost fundamental step for developing Vietnamese Image Captioning. In this scope, we first build a dataset which contains manually written captions for images from Microsoft COCO dataset relating to sports played with balls, we called this dataset UIT-ViIC. UIT-ViIC consists of 19,250 Vietnamese captions for 3,850 images. Following that, we evaluate our dataset on deep neural network models and do comparisons with English dataset and two Vietnamese datasets built by different methods. UIT-ViIC is published on our lab website for research purposes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题