视觉变形金刚作为生态学自动分类的新范式

论文标题

视觉变形金刚作为生态学自动分类的新范式

Ensembles of Vision Transformers as a New Paradigm for Automated Classification in Ecology

论文作者

Kyathanahally, S., Hardeman, T., Reyes, M., Merz, E., Bulas, T., Brun, P., Pomati, F., Baity-Jesi, M.

论文摘要

监测生物多样性对于管理和保护自然资源至关重要。通过大型时间或空间尺度收集生物的图像是一种有前途的做法，可以监测自然生态系统的生物多样性，从而提供大量数据，并且对环境的干扰最少。目前，深度学习模型用于将生物分类自动化为分类单元。但是，这些分类器中的不精确性引入了难以控制的测量噪声，并且可能会大大阻碍数据的分析和解释。 {我们通过数据有效的图像变压器（DEIT）的集合来克服了这一限制，这不仅易于训练和实现，而且显着胜过了先前的艺术状态（SOTA）。我们在从浮游生物到鸟类的十个生态成像数据集上验证了我们的结果。在所有数据集中，我们都达到了一个新的SOTA，并且相对于以前的SOTA的错误降低了29.35％至100.00％，并且通常可以实现非常接近完美分类的表演。 Deits的合奏表现更好，不是因为独立模型和较低的TOP-1概率在预测中的重叠较小，而是由于较小的重叠。这增加了结合的好处，尤其是在使用几何平均值结合单个学习者时。虽然我们仅测试生物多样性图像数据集的方法，但我们的方法是通用的，可以应用于任何类型的图像。

Monitoring biodiversity is paramount to manage and protect natural resources. Collecting images of organisms over large temporal or spatial scales is a promising practice to monitor the biodiversity of natural ecosystems, providing large amounts of data with minimal interference with the environment. Deep learning models are currently used to automate classification of organisms into taxonomic units. However, imprecision in these classifiers introduces a measurement noise that is difficult to control and can significantly hinder the analysis and interpretation of data. {We overcome this limitation through ensembles of Data-efficient image Transformers (DeiTs), which not only are easy to train and implement, but also significantly outperform} the previous state of the art (SOTA). We validate our results on ten ecological imaging datasets of diverse origin, ranging from plankton to birds. On all the datasets, we achieve a new SOTA, with a reduction of the error with respect to the previous SOTA ranging from 29.35% to 100.00%, and often achieving performances very close to perfect classification. Ensembles of DeiTs perform better not because of superior single-model performances but rather due to smaller overlaps in the predictions by independent models and lower top-1 probabilities. This increases the benefit of ensembling, especially when using geometric averages to combine individual learners. While we only test our approach on biodiversity image datasets, our approach is generic and can be applied to any kind of images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题