In January, we announced AI agents—a new feature for integrating your machine learning models with CVAT. Since then, we’ve been working hard on improving this feature. In this post, we’d like to share our progress.But first, a quick recap: an auto-annotation (AA) function is a Python object that acts as an adapter between the CVAT SDK and your ML model. Once implemented, you can either:Use the cvat-cli task auto-annotate command to annotate a CVAT task, orRegister it with CVAT and then start one agent via the cvat-cli function create-native and cvat-cli function run-agent commands. After that, use the CVAT UI to select a task to annotate and its parameters. If you register in an organization, other members can use it as well.Now, let’s see what’s new.Skeleton supportAn AA function may output any CVAT-known shapes. However, when agents debuted, AA functions couldn’t output skeletons with the agent-based workflow. In CVAT and CVAT CLI version 2.32.0, this has been rectified.Let’s implement an AA function with skeleton outputs using a YOLO11 pose model from the Ultralytics library and see how it works.1) We’ll start with an empty source file yolo11_pose_func.py and add the necessary imports:import PIL.Image
from ultralytics import YOLO import cvat_sdk.auto_annotation as cvataa
import cvat_sdk.models as models2) Then, we need to create an instance of the YOLO model:_model = YOLO("yolo11n-pose.pt")3) To turn our file into an AA function, we’ll need to add two things. First, a spec – a description of the labels that the function will output.spec = cvataa.DetectionFunctionSpec( labels=[ cvataa.skeleton_label_spec(name, id, [ cvataa.keypoint_spec(kp_name, kp_id) for kp_id, kp_name in enumerate([ "Nose", "Left Eye", "Right Eye", "Left Ear", "Right Ear", "Left Shoulder", "Right Shoulder", "Left Elbow", "Right Elbow", "Left Wrist", "Right Wrist", "Left Hip", "Right Hip", "Left Knee", "Right Knee", "Left Ankle", "Right Ankle", ]) ]) for id, name in _model.names.items() ],
)For each class that the model supports, we use skeleton_label_spec to create the corresponding label spec and keypoint_spec to create a sublabel spec for each keypoint. The Ultralytics library doesn’t provide a way to dynamically determine the supported keypoints, so we have to hardcode them in our function. Our hardcoded list is taken from Ultralytics’s pose estimation documentation.Note that each label and sublabel spec requires a distinct numeric ID. For the skeleton as a whole, we use the class ID that Ultralytics gives us, whereas for the sublabels we just assign sequential IDs.The other thing we need to add is a detect function that performs the actual inference.def detect( context: cvataa.DetectionFunctionContext, image: PIL.Image.Image
) -> list[models.LabeledShapeRequest]: conf_threshold = 0.5 if context.conf_threshold is None else context.conf_threshold return [ cvataa.skeleton( int(label.item()), [ cvataa.keypoint(kp_index, kp.tolist(), outside=kp_conf.item() < 0.5) for kp_index, (kp, kp_conf) in enumerate(zip(kps, kp_confs)) ], ) for result in _model.predict(source=image, conf=conf_threshold) for label, kps, kp_confs in zip(result.boxes.cls, result.keypoints.xy, result.keypoints.conf) ]The first thing detect does is determine the confidence threshold the user has specified (defaulting to 0.5 if they didn’t specify any). Then it calls the model and creates a CVAT skeleton shape for each detection returned by the model. Keypoints with low confidence are marked with the outside property so that CVAT hides them from the view.Having implemented our function, we can integrate it with CVAT in the usual way. First, we’ll need to install CVAT CLI and the Ultralytics library:pip install cvat-cli ultralyticsThen, we can register our function with CVAT and run an agent for it:cvat-cli --server-host <CVAT_BASE_URL> --auth <USERNAME>:<PASSWORD> \ function create-native "YOLO11n-pose" --function-file yolo11_pose_func.py cvat-cli --server-host <CVAT_BASE_URL> --auth <USERNAME>:<PASSWORD> \ function create-native <FUNCTION_ID> --function-file yolo11_pose_func.pywhere:<CVAT_BASE_URL> is the URL of the CVAT instance you want to use (such as https://app.cvat.ai).<USERNAME> and <PASSWORD> are your credentials.<FUNCTION_ID> is the number output by the first command.Now we can try out our function in action. To do so, we’ll need to open CVAT and create a new task with a skeleton label (or add such a label to an existing task). To this label, we’ll add keypoints corresponding to the keypoints in the model:Now we can open the task page, click Actions → Automatic annotation and select our model:Pressing “Annotate” will begin the annotation process. Once it’s complete, we can open a frame from the task and see the results:Attribute supportCVAT allows shapes to include attributes: extra pieces of data that pertain to a given object. The allowed attributes set is configured per label, as well as each attribute’s type and allowed values. Skeleton keypoints may have their individual attributes as well.As of CVAT and CVAT CLI version 2.31.0, AA functions may define labels with attributes, and output shapes with attributes. This works with both the direct annotation and agent-based workflows.To demonstrate this feature, we’ll implement a function that recognizes text via the EasyOCR library. The function will output rectangle shapes, each with an attribute containing the recognized text string.Here’s the function code:import PIL.Image
import easyocr
import numpy as np import cvat_sdk.auto_annotation as cvataa
import cvat_sdk.models as models
from cvat_sdk.attributes import attribute_vals_from_dict _reader = easyocr.Reader(['en']) spec = cvataa.DetectionFunctionSpec( labels=[ cvataa.label_spec("text", 0, type="rectangle", attributes=[ cvataa.text_attribute_spec("string", 0), ]) ],
) def detect( context: cvataa.DetectionFunctionContext, image: PIL.Image.Image
) -> list[models.LabeledShapeRequest]: conf_threshold = 0.5 if context.conf_threshold is None else context.conf_threshold input = np.array(image.convert('RGB'))[:, :, ::-1] # EasyOCR expects BGR return [ cvataa.rectangle( 0, list(map(float, [*points[0], *points[2]])), attributes=attribute_vals_from_dict({0: string}), ) for points, string, conf in _reader.readtext(input) if conf >= conf_threshold ]It has the same basic elements as our previous function. To output our attribute, we declare it in the spec (via the attributes argument of label_spec) and specify its value for each output rectangle (via the attributes argument of rectangle).Note that since we have only one label and one attribute, we just hardcode 0 as the ID for both of them.To see our function in action, we’ll need to install the dependencies:pip install cvat-cli easyocrThen, as before, register the function and run an agent:cvat-cli --server-host <CVAT_BASE_URL> --auth <USERNAME>:<PASSWORD> \ function create-native "EasyOCR" --function-file easyocr_func.py cvat-cli --server-host <CVAT_BASE_URL> --auth <USERNAME>:<PASSWORD> \ function create-native <FUNCTION_ID> --function-file easyocr_func.pyWe’ll need to create a task with a label and an attribute that matches our function:Then press "Actions → Automatic annotation", select the model, confirm and see the results:Label typesThe last improvement we’d like to discuss is subtle. Let’s revisit the function spec from the previous section:cvataa.label_spec("text", 0, type="rectangle", ...Since the introduction of the AA function interface, the ability to specify a label type in a spec was present but previously ignored. However, since CVAT and CVAT CLI 2.29, specifying this type has small but beneficial effects:When you select which task label to map a function’s label to in the “Automatic annotation” dialog, CVAT will only offer task labels whose type is compatible with the function’s type. For example, if the function label has type “rectangle,” then CVAT will only offer task labels of types “rectangle” and “any.” This prevents you from accidentally adding shapes of an unwanted type via automatic annotation.When you use the “From model” button to copy labels from a function to a task, the specified label type will be set on the copied label.If a function outputs a shape whose type is inconsistent with the declared type of the shape’s label, the CVAT CLI will abort the automatic annotation. This catches function implementation mistakes.Label type declarations are optional. By default, each label spec is assumed to have type “any,” except for labels created with skeleton_label_spec, whose type will be “skeleton.” However, we recommend you declare your label types, as it is easy to do and will help prevent implementation and usage mistakes.ConclusionWith these changes, agent-based functions are catching up to the capabilities of Nuclio-based functions. However, unlike Nuclio-based functions, they can be integrated with CVAT Online and other CVAT instances without server control.We’re continuing to work on this feature to add more capabilities, so stay tuned for updates.


Product Updates
March 26, 2025
CVAT AI agents: What's new?