Cloudflare’s AI Audit: Combatting Unwanted AI Bot Traffic

Cloudflare’s AI Audit: Combatting Unwanted AI Bot Traffic

Modern generative AI models such as large language models are trained on huge amounts of data, much of which is scraped from the web autonomously by bots. Cloudflare, one of the world’s largest content delivery networks (CDNs), has launched a tool to beat back the bot hoards: AI Audit.

Launched into beta on 23 September, and generally available to Cloudflare customers, AI Audit gives site owners new visibility into the activities of AI bots scraping their sites. They can see which AI model providers are accessing their content and decide whether to allow or block them. In the future, Cloudflare plans to help content owners set a fair price that AI bots must pay to crawl a site’s content.

“We set a goal at Cloudflare to help build a better Internet. An Internet where great content gets published, and great communities get built,” said Sam Rhea, VP of emerging technology at Cloudflare. “But one thing that makes us nervous is that some AI use cases potentially put that at risk.”

New Protection From Unwanted Bot Traffic

Many websites try to manage unwanted bots with robots.txt, a file that instructs bots on how to behave when crawling the site. But it’s not foolproof: Bots can simply ignore the instructions.

Cloudflare’s AI Audit doesn’t rely on robots.txt but instead uses the company’s Web Application Firewall, a service that can automatically identify the source of web traffic. While probably best known for its defense against distributed denial of service (DDoS) attacks, which use bot networks to bombard victims with requests, the firewall can also identify bots used by major AI companies such as OpenAI.

The burden of serving web pages to AI bots doesn’t typically impact large sites with significant funding. Logan Abbott, the president of SourceForge and SlashDot, said the two sites “see tens of millions of AI crawler sessions every month,” but have infrastructure in place to handle the load.

However, bots can be a problem for sites owned by small companies and individuals. BingeClock, a site that helps TV super-fans track the shows they watch (and how long it takes to watch them), was forced to add server resources to handle the load that bots placed on the site.

Cloudflare’s AI Audit provides data analytics to track and block AI bots.Cloudflare

“So all summer, I was adding extra [Amazon Web Services] instances for my API, as I found the site was becoming unusable for the actual users,” said Billy Gardner McIntyre, a freelance developer and writer who operates BingeClock by himself. Larger sites might handle the issue with dynamic load balancing, which automatically spins up new instances as required. But that approach can lead to unpredictable spikes in service costs, which is risky for people who operate smaller websites and businesses.

Cloudflare’s AI Audit provided relief for McIntyre, who wrote about his experience on BingeClock’s engineering blog. He noticed a substantial decrease in unwanted AI traffic….

Read full article: Cloudflare’s AI Audit: Combatting Unwanted AI Bot Traffic

The post “Cloudflare’s AI Audit: Combatting Unwanted AI Bot Traffic” by Matthew S. Smith was published on 10/07/2024 by spectrum.ieee.org