Segment Anything for Videos: A Systematic Survey

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond, with the segment anything model (SAM) having sparked a passion for exploring task-agnostic visual foundation models. Empowered by its remarkable zero-shot generalization, SAM is currently challenging numerous traditional paradigms in CV, delivering extraordinary performance not only in various image segmentation and multi-modal segmentation (\eg, text-to-mask) tasks, but also in the video domain. Additionally, the latest released SAM 2 is once again sparking research enthusiasm in the realm of promptable visual segmentation for both images and videos. However, existing surveys mainly focus on SAM in various image processing tasks, a comprehensive and in-depth review in the video domain is notably absent. To address this gap, this work conducts a systematic review on SAM for videos in the era of foundation models. As the first to review the progress of SAM for videos, this work focuses on its applications to various tasks by discussing its recent advances, and innovation opportunities of developing foundation models on broad applications. We begin with a brief introduction to the background of SAM and video-related research domains. Subsequently, we present a systematic taxonomy that categorizes existing methods into three key areas: video understanding, video generation, and video editing, analyzing and summarizing their advantages and limitations. Furthermore, comparative results of SAM-based and current state-of-the-art methods on representative benchmarks, as well as insightful analysis are offered. Finally, we discuss the challenges faced by current research and envision several future research directions in the field of SAM for video and beyond.

Related collections

Author and article information

Journal

Publication date Created: 30 July 2024

Article

ArXiV ID: 2408.08315

SO-VID: a7363668-45ac-4d76-8dd8-cd72adf5604c

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments https://github.com/983632847/SAM-for-Videos

Categories cs.CV cs.AI

ScienceOpen disciplines: Computer vision & Pattern recognition,Artificial intelligence

Data availability:

ScienceOpen disciplines: Computer vision & Pattern recognition, Artificial intelligence

Segment Anything for Videos: A Systematic Survey

Read this article at

Abstract

Related collections

Privacy and Data Protection

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 324