Try for free
PRODUCT
InstallQuickstart
COMPANY
AboutCareersPress
PRICING
CVAT OnlineCVAT On-Prem
RESOURCES
Resources
COMMUNITY
DiscordGitterGitHubContribute
CONTACT US
Contact us

CVAT Now Supports Video Annotation with SAM 2

Sveta Go
March 21, 2025
Note: This feature is available only to CVAT Enterprise Basic and Premium accounts.

We're excited to announce that CVAT now supports automated video annotation with Segment Anything Model 2 (SAM 2) Tracker. SAM 2 is the successor to Meta AI's advanced foundation model, designed for real-time, comprehensive object segmentation in images and video.

SAM 2, released in July 2024, allows users to detect and segment any object in an image or video based on specific input prompts, like interactive points, bounding boxes, or masks. Once segmented, the model tracks them across video frames in real-time, ensuring accuracy and consistency.

Evolution of SAM-powered annotation in CVAT

When SAM's first edition was released in 2023, we integrated it into CVAT's SaaS and on-premises versions within weeks, allowing customers to enhance their image annotation tasks. When SAM 2 was released in 2024, we quickly followed suit to provide faster and more accurate segmentation.

An aerial view demonstrating automated segmentation of agricultural fields and crop health conditions using SAM 2 in CVAT.

We’ve extended its object segmentation and tracking capabilities to video, building on the success of these integrations.

Let's examine how SAM 2 Tracker works and how it can improve your video annotation workflows.

Bringing the power of SAM 2 to video annotation

Labeling videos is essential for training AI models in industries relying on video data, like autonomous vehicles, sports analysis, and robotics. Those models need a large amount of accurately labeled data—from hundreds to millions of videos—to function reliably.

Labeling vast data is a challenging task. Video annotation, unlike image labeling, adds a temporal dimension, increased data volume, and the need for frame consistency.

Automated annotation tools like CVAT’s new SAM 2 Tracker are essential to alleviating these challenges by streamlining the process and reducing manual effort.

In CVAT, there are a few methods to annotate videos:

  • The old-school manual, frame-by-frame labeling that requires drawing annotations on every frame.
  • And, interpolation-based labeling, where annotations are placed on keyframes and automatically propagated across intermediate frames.

While those two options remain viable for simpler scenarios or limited-scope projects, SAM 2 Tracker significantly enhances the convenience and speed of object segmentation and tracking in videos, especially for complex scenes with rapid movements or frequent obstructions.

An aerial view demonstrating automated segmentation and tracking of a skid steer loader on a construction site using SAM 2 and CVAT

Key features of CVAT’s SAM 2 Tracker

  • Instant segmentation: The SAM 2 Tracker outlines the contours of an object in a single frame when you click on it.
  • Automatic tracking: The tracker preserves the object's shape and position as it moves across frames.
  • Support for complex objects: Works effectively with partial overlaps or changes in the background.
  • Interactive refinement: Adjust annotations at any stage.

How to Use SAM 2 Tracker in CVAT

Setting up object tracking with SAM 2 in CVAT

Follow these steps to segment and track objects in your videos with SAM 2:

  1. Open your CVAT account and select the video you want to annotate from the list of the annotation tasks.
  2. In the annotation toolbar, select the "Magic Wand." Then, use the Interactor tab to choose the label and SAM 2 to generate a segmentation mask for your object on the first (zero) frame.
Note: For the Tracker to work, don't forget to turn on the "Convert the mask into a polygon" slider, because the mask cannot be further converted into a "Track" mode, and the Tracker will not be able to track an object annotated with a mask as a single element across multiple frames.
  1. In the right-hand Objects panel, click the three-dot menu (⋮) next to your polygon, then select "Run annotation action."
  1. Select "SAM 2 Tracker" from the pop-up menu and set the number of frames to track. Important: If you annotated the object with a polygon “Shape,” don’t forget to convert them into “Track” mode before running the Tracker.
  2. SAM 2 will track your polygon across subsequent frames.
Note: Due to deployment requirements, SAM 2 Tracker is currently available exclusively for CVAT’s On-Prem paid accounts (Enterprise Basic and Premium).

Getting started

For more information about SAM 2 Tracker, visit our documentation. For SAM 2 details, visit its site and GitHub.

  • If you have an Enterprise account and want to install SAM 2 Tracker, contact our support team
  • If you don’t have a CVAT On-prem account or use CVAT Online and want to try SAM 2 for video annotation, contact our sales team.
Go Back