This browser is not actively supported anymore. For the best passle experience, we strongly recommend you upgrade your browser.

Technology Law

| 3 minute read

Cloudflare’s New Economic Model to Address AI Web Scraping

Last week, Cloudflare released a private beta version of “pay-per-crawl,” a technical solution designed to help websites using Cloudflare automatically block known AI crawlers unless domain owners explicitly opt in.  This solution offers publishers an off-the-shelf method to manage and collect revenue from web crawlers, which is especially important in the age of AI. 

Background

The development and training of AI models requires substantial amounts of input data, much of which is collected through public sources.  To collect this data, AI model and data providers rely on web crawlers to scrape and index internet content.  Creators and publishers have increasingly shared concerns about how their content will be used and monetized, resulting in litigation and industry efforts

To date, creators and publishers have relied on technical means and contract negotiation to limit scraping on their websites.  For example, many publishers place a robots.txt text file on their website to restrict access.  Larger publishers have negotiated deals directly with AI model providers.  As has been publicly reported, The New York Times and Amazon recently entered into a multiyear licensing agreement which will bring The New York Times editorial content to Amazon’s AI platforms.  Further, The Atlantic and OpenAI entered into a strategic partnership which leverages The Atlantic’s articles within OpenAI’s products and invites The Atlantic’s product team to help shape how news is presented in future OpenAI technologies. 

Cloudflare’s new tool offers a potential third solution for publishers.

How Does Pay-Per-Crawl Work?

Through pay-per-crawl, publishers can choose to either block crawlers entirely, allow for unrestricted access, or charge crawlers a domain-wide, per‑request price to access content.  Publishers can also set preferences based on the crawler’s purpose for requesting content and can bypass fees for specific crawlers while charging others a flat, domain-wide price for access.

Cloudflare uses a bot detection system to locate and separate AI crawlers from other search engine bots.  Through a bot verification system, these crawlers can declare their purpose for requesting content and for whom they work.  Cloudflare uses a rules engine to apply the publisher’s existing WAF policies and bot management preferences before enforcing their pay-per-crawl decisions.  Then, it presents crawlers with status codes reflecting the publisher’s control and pricing preferences.  If a publisher charges for access, the crawler receives a status code signaling that payment is required for the requested resources and enabling the crawler to accept or reject the publisher’s price.  As the Merchant of Record, Cloudflare records and aggregates the crawlers billing events, charges them, and distributes funds to the publisher. 

Cloudflare is uniquely situated to offer such a program: it provides global cloud services for millions of active websites and manages and protects traffic for 20% of the internet.  The company supports content delivery by detecting and mitigating security threats, such as DDoS attacks.  Cloudflare claims to have the world’s most advanced bot management solutions and has been helping domain owners manage crawler interactions for years.  Several major publishers have already indicated support for this permission-based model. 

What does this mean for publishers?

Through pay-per-crawl’s bypass feature, publishers retain the ability to charge crawlers outside of Cloudflare.  This flexibility opens the door to increased negotiations and content partnerships between AI model and data providers and creators – particularly important as publishers adapt to an internet built, in part, for AI agents. 

Developing an effective model that fosters a mutually beneficial relationship requires cooperation between publishers and creators and crawlers and AI model providers.  The auditing record Cloudflare creates to manage crawler requests and billing relationships could also enhance publisher’s monitoring capabilities, providing greater transparency in data collection and scraping behavior.  On the other hand, high adoption fees could increase training data costs significantly, potentially creating downstream price increases for users. 

Past efforts to monetize content in ways other than advertising have struggled to gain traction, so whether this model reaches wide adoption—among publishers and AI model providers that are willing to pay—remains to be seen.  Until then, publishers should consider whether pay-per-crawl presents a viable option for their business, and assess whether implementation is feasible from legal, economic, and technical perspectives. 

Tags

ai, crawlers, technology law, publishers, data scraping