第四次大规模视频对象分割挑战的第二名解决方案 - 轨道3：参考视频对象细分

论文标题

第四次大规模视频对象分割挑战的第二名解决方案 - 轨道3：参考视频对象细分

The Second Place Solution for The 4th Large-scale Video Object Segmentation Challenge--Track 3: Referring Video Object Segmentation

论文作者

Cao, Leilei, Li, Zhuang, Yan, Bo, Zhang, Feng, Qi, Fengliang, Hu, Yuchen, Wang, Hongbin

论文摘要

引用视频对象分割任务（RVO）的目的是在所有视频帧中引用的给定视频中的对象实例。由于需要在各个实例中理解跨模式语义，因此此任务比传统的半监督视频对象细分更具挑战性，在该视频对象细分中，在第一个帧中给出了地面真相对象掩盖。随着变压器在对象检测和对象细分方面的巨大成就，RVOS已取得了显着的进展，而Reformer达到了最新的性能。在这项工作中，基于强大的基线框架 - 引用者，我们提出了几个技巧来进一步提高，包括周期性学习率，半监督方法和测试时间增加推断。改进的推荐人在CVPR2022上排名第二，参考YouTube-VOS挑战。

The referring video object segmentation task (RVOS) aims to segment object instances in a given video referred by a language expression in all video frames. Due to the requirement of understanding cross-modal semantics within individual instances, this task is more challenging than the traditional semi-supervised video object segmentation where the ground truth object masks in the first frame are given. With the great achievement of Transformer in object detection and object segmentation, RVOS has been made remarkable progress where ReferFormer achieved the state-of-the-art performance. In this work, based on the strong baseline framework--ReferFormer, we propose several tricks to boost further, including cyclical learning rates, semi-supervised approach, and test-time augmentation inference. The improved ReferFormer ranks 2nd place on CVPR2022 Referring Youtube-VOS Challenge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题