使用时间聚合网络和动态模板匹配的快速视频对象分割

论文标题

使用时间聚合网络和动态模板匹配的快速视频对象分割

Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching

论文作者

Huang, Xuhua, Xu, Jiarui, Tai, Yu-Wing, Tang, Chi-Keung

论文摘要

在视频对象细分（VOS）中取得了重大进展，这是视频对象跟踪任务的最佳级别。尽管VOS任务可以自然地将其分解为图像语义细分和视频对象跟踪，但在细分中进行的研究工作要比跟踪要多得多。在本文中，我们通过提出新的时间聚合网络和一种新型的动态时间发展模板匹配机制来将“逐探”引入VO，可以将分段连贯地整合到跟踪中，以实现显着改善的性能。值得注意的是，我们的方法完全在线，因此适用于一次性学习，我们的端到端训练模型允许在一个正向通行证中进行多个对象细分。我们在戴维斯基准测试中实现了新的最新性能，而速度和准确性都没有复杂的铃铛和哨子，每帧的速度为0.14秒，J＆F度量分别为75.9％。

Significant progress has been made in Video Object Segmentation (VOS), the video object tracking task in its finest level. While the VOS task can be naturally decoupled into image semantic segmentation and video object tracking, significantly much more research effort has been made in segmentation than tracking. In this paper, we introduce "tracking-by-detection" into VOS which can coherently integrate segmentation into tracking, by proposing a new temporal aggregation network and a novel dynamic time-evolving template matching mechanism to achieve significantly improved performance. Notably, our method is entirely online and thus suitable for one-shot learning, and our end-to-end trainable model allows multiple object segmentation in one forward pass. We achieve new state-of-the-art performance on the DAVIS benchmark without complicated bells and whistles in both speed and accuracy, with a speed of 0.14 second per frame and J&F measure of 75.9% respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题