论文标题
超级预处理的姿势估计模型用于行为分析
SuperAnimal pretrained pose estimation models for behavioral analysis
论文作者
论文摘要
行为的量化对于从神经科学,兽医医学和动物保护工作等方面的应用至关重要。行为分析的一个常见关键步骤是首先提取有关动物的相关关键,称为姿势估计。但是,当前对姿势的可靠推断需要域知识和手动标签工作来构建监督模型。我们提出了一系列技术创新,使一种新方法(共同称为Superanimal)开发统一的基础模型,这些模型可用于45多种,而无需其他人类标签。具体而言,我们引入了一种方法,以统一标记不同的数据集(通过我们的广义数据转换器)统一关键点空间,并以训练这些不同数据集的方式,以使它们不会在灾难性上忘记关键点(通过我们的关键点梯度渐变屏蔽和内存重新播放方法)忘记关键点)。这些模型在六个姿势基准中表现出卓越的性能。然后,为了确保最终用户的最大可用性,我们演示了如何在不同标签的数据上微调模型,并为无监督的视频适应提供了工具,以提高性能并降低跨帧的抖动。如果模型进行了微调,我们显示超级模型的数据效率比以前的基于转移学习的方法高10-100美元。我们说明了模型在小鼠行为分类中的实用性和马匹中的步态分析。总体而言,这为动物姿势估计提供了数据效率的解决方案。
Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts. A common key step for behavioral analysis is first extracting relevant keypoints on animals, known as pose estimation. However, reliable inference of poses currently requires domain knowledge and manual labeling effort to build supervised models. We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models that can be used on over 45 species, without additional human labels. Concretely, we introduce a method to unify the keypoint space across differently labeled datasets (via our generalized data converter) and for training these diverse datasets in a manner such that they don't catastrophically forget keypoints given the unbalanced inputs (via our keypoint gradient masking and memory replay approaches). These models show excellent performance across six pose benchmarks. Then, to ensure maximal usability for end-users, we demonstrate how to fine-tune the models on differently labeled data and provide tooling for unsupervised video adaptation to boost performance and decrease jitter across frames. If the models are fine-tuned, we show SuperAnimal models are 10-100$\times$ more data efficient than prior transfer-learning-based approaches. We illustrate the utility of our models in behavioral classification in mice and gait analysis in horses. Collectively, this presents a data-efficient solution for animal pose estimation.