Earlier this summer Meta made a US $14.3 billion bet on a company most people had never heard of before: Scale AI. The deal, which gave Meta a 49 percent stake, sent Meta’s competitors—including OpenAI and Google—scrambling to exit their contracts with Scale AI for fear it might give Meta insight into how they train and fine-tune their AI models.
Scale AI is a leader in data labeling for AI models. It’s an industry that, at its core, does what it says on the tin. The most basic example can be found in the thumbs-up and thumbs-down icons you’ve likely seen if you’ve ever used ChatGPT. One labels a reply as positive; the other, negative.
But as AI models grow, both in model size and popularity, this seemingly simple task has grown into a beast every organization looking to train or tune a model must manage.
“The vast majority of compute is used on pre-training data that’s of poor quality,” says Sara Hooker, a vice president of research at Cohere Labs. “We need to mitigate that, to improve it, applying super high-quality gold dust data in post-training.”
What Is Data Labeling?
Computer scientists have, in the past, relied on the axiom “garbage in, garbage out.” It suggests that bad inputs always lead to bad outputs.
However, as Hooker suggests, the training of modern AI models defies that axiom. Large language models are trained on raw text data scraped from the public Internet, much of which is of low quality (Reddit posts tend to outnumber academic papers).
Cleaning and sorting training data makes sense in theory, but with modern models training on petabytes of data, it’s impractical in practice due to the sheer volume of data involved. That’s a problem, because popular AI data training sets are known to include racist, sexist, and criminal data. Training data can also include more subtle issues, like sarcastic advice or purposefully misleading advice. Put simply: a lot of garbage finds its way into the training data.
So data labeling steps in to clean up the mess. Rather than trying to scrub out all of the problematic elements of the training data, human experts manually provide feedback on the AI model’s output after the model is trained. This molds the model, reducing undesirable replies and changing the model’s demeanor.
Sajjad Abdoli, founding AI scientist at data labeling company Perle, explains this process of creating “golden benchmarks” to fine-tune AI models. What exactly that benchmark contains will depend on the purpose of the model. “We walk our customers through the procedure, and create the criteria for a quality assessment,” says Abdoli.
Consider a typical chatbot. Most companies want to build a chatbot that’s helpful, accurate, and concise, so data labelers provide feedback with those goals in mind. Human data labelers read the replies generated by the model on a set of test prompts. A reply that seems to answer the prompt with concise and accurate information would be considered…
Read full article: Meta’s Investment in AI Data Labeling Explained

The post “Meta’s Investment in AI Data Labeling Explained” by Matthew S. Smith was published on 08/01/2025 by spectrum.ieee.org
Leave a Reply