Try for Free
COMPANY
BlogPressAboutCareers
PRICING
CVAT.ai cloudOn-prem deployment
RESOURCES
InstallQuickstartDocumentation
COMMUNITY
DiscordGitterGitHubContribute
CONTACT US
contact@cvat.ai
Calculating the Cost Image Annotation for AI Projects: Annotating Solo

Creating computer vision AI systems requires meticulous training and fine-tuning of deep learning (DL) models using annotated images or videos. These annotations are crucial for developing AI products capable of accurate analysis, prediction, and generating reliable results. However, the process of image annotation significantly contributes to the overall cost of developing such systems. 

"Instead of focusing on the code, companies should focus on developing systematic engineering practices for improving data in ways that are reliable, efficient, and systematic. In other words, companies need to move from a model-centric approach to a data-centric approach."

— Andrew Ng, CEO and Founder of Landing AI

How can you calculate the optimal price for image annotation to include in your budget? 

We'll explore the various factors that influence the cost of image and video annotation. More importantly, we'll discuss why the price of image annotation should not be your only consideration when training and fine-tuning computer vision models.

Prerequisites

To better understand the dynamics of daily life, let’s consider a common scenario: life at home. 

Most of us live in houses, often not alone but with families. These families can vary in size and composition—ranging from small units to large, bustling households with children, pets, and elderly members who require special attention and care.

This variety can lead to issues that are relevant to all living areas: children might leave toys like LEGO pieces scattered on the floor, elderly individuals may misplace their glasses or other medical devices and struggle to find them, and pets could shed fur or leave other surprises around. All of these factors contribute to a household's everyday chaos.

Certainly, several solutions are already available on the market, such as automatic vacuum cleaners and electric mops. However, let's consider the possibility that these devices might not be as smart as we need them to be.

As a scientist leading a small research team, you aim to introduce an innovative product to the market—a smart home assistant robot. This advanced robot will differentiate between actual dirt and valuable items. It will clean up the former and signal the latter's presence, aiding in retrieving lost items. This functionality will not only keep homes cleaner but also make it easier to find misplaced objects.



For research purposes, the scientist and their team have gathered a dataset comprising 100,000 images of various rooms with items scattered on the floor.



The volume of 100,000 images comes from the average batch size we typically see in robotics projects. This number is supported by the available datasets in the public domain, where the quantity of images usually ranges from 10,000 to several million per dataset.

Let’s assume that one image on average has 23 objects. So you need to annotate an average of 2,300,000 objects in total (or slightly fewer or more).

This series of articles describes four cases on how to deal with such tasks:

  • Case 1: You handle the task yourself or with minimal colleague help.
  • Case 2: You hire annotators and try to build a team yourself. 
  • Case 3: You outsource the task to professionals. 
  • Case 4: Crowdsourcing

Case 1: You handle the task yourself or with minimal help from colleagues

A small disclaimer: annotating solo is fine for small amounts of data, but doesn’t work for big datasets. And here is why.

The Annotation Stage

For the robotics project, the scientist needs to select useful frames from the extensive video collection and create a detailed data annotation specification. Accurate and precise polygon annotations will be used to label objects in the images.

Let’s assume, that according to the data annotation specification, 40 classes will be annotated using polygons, with each instance annotated separately. A basic description of how to annotate is necessary, noting that the full specification can take 30-50 pages and will include detailed instructions on how to annotate each class correctly with good, bad examples and corner cases. Writing a specification also requires time estimated in days and weeks.

The time required to annotate an object using polygons can vary depending on several factors, including the complexity and size of the object, the clarity of the image, and the expertise of the annotator.

On average, it can take anywhere from a few seconds to several minutes per object. Here are some general estimates:

  • Simple Object (e.g., a rectangular object): 5-10 seconds
  • Moderately Complex Object (e.g., a car): 30-60 seconds
  • Highly Complex Object (e.g., a human with detailed limb annotations): 1-3 minutes or more

Detailed polygon annotations can take significantly longer for precise tasks, especially for objects with intricate details and irregular shapes.

If the quality requirements permit, AI tools like the Segment Anything Model can be used to speed up the annotation process. However, for some tasks, these models often lack the precision needed and require extensive manual corrections.

Let's focus on the task at hand. We are dealing with an image of a room scattered with small objects. Typically, a skilled annotator can label each object in about 40-50 seconds. However, since our scientists do not perform annotations daily, the expected speed of annotation in our case will be approximately 60 seconds (or 1 minute) per object.

Now let’s talk about money and costs. It's important to note that sometimes people think that annotating themselves is cheap because they do not account for their time, which is paid time unless the annotation is done outside of working hours.

Let's assume the robotics engineer is from the USA and annotation is done during working hours. We will research job postings on Indeed, the well-known job aggregator site, and then check the average salary before taxes.

The average salary calculated from the data provided is approximately $42 per hour (for June 2024).

All that's left is to add the cost of the annotation tool. This cost can be zero if the scientist is tech-savvy and can install a self-hosted solution. However, if that's not the case, the scientist will need a tool that may be free or cost some money.

If you plan to annotate yourself or ask a colleague(s)  to help you, so you can annotate as a small Team,  in the case of CVAT it will cost you $33 per seat

Remember that even tools that are free to download and install require time and resources to set up and support, and time is money. So, while we say "free," it means that you can download and install the tool, but the rest depends on your time, expertise, and effort (and how much of your paid time will be spent on this).

Let’s sum it up:

First, we calculate the total amount of hours that the scientist will need to annotate all objects:

2,300,000  objects x * 60 seconds = 138,000,000 seconds.

138,000,000 seconds / 3,600 = 38,333 hours (rounded to the whole number).

In the best-case scenario, it will take:

  • 4,792 working days
  • 240 months or 20 years of one person

If the scientist drops all other duties and dedicates 8 hours daily solely to annotation.

The cost of the annotation will be:

38,333 hours * $42 = $1,609,986 + cost of the tool on the top

Note that the described approach lacks scalability. In the future, maintaining the dataset and addressing any emerging issues will be necessary. Additionally, deployment in a production environment typically requires a significantly larger volume of data. Of course, the engineer can ask colleagues to help, but it may reduce the time but not the cost.

The Quality Assurance Stage

To ensure quality assurance when annotating data independently, an automated system known as a "Honeypot" can be used.

The Honeypot method is cost-effective but pretty time-consuming. It involves setting aside approximately 3% of your dataset, or about 3,000 images from a set of 100,000, specifically for quality checks.

You will need to use a previously created specification that outlines your annotation requirements and standards. Annotate this selected subset of images yourself to serve as a benchmark. While this method saves time in the long run, it still requires an initial investment of time and resources to set up and perform these annotations, which translates to a monetary cost.

***

And that’s it. Feel free to leave any comments on our social networks, and we'll gladly respond. In our next update, we will answer the question of how much an in-house annotation team costs.

Not a CVAT.ai user?  Click through and sign up here

Do not want to miss updates and news? Have any questions? Join our community:

Facebook

Discord

LinkedIn

Gitter

GitHub

June 20, 2024
CVAT Team
Go Back