IETF proposals on web crawling draw criticism from digital rights groups

Digital rights groups are criticizing IETF proposals on AI-related web crawling and bot authentication, saying the standards debate could affect lawful automated access to public web data.

Rohit Kumar
Rohit Kumar
1 min read8 views
IETF proposals on web crawling draw criticism from digital rights groups

Digital rights groups are criticizing proposals under discussion at the Internet Engineering Task Force (IETF), saying changes affecting automated access to public web data could alter how websites manage crawling and scraping.

According to the source article, the debate centers on automated access to publicly available online information, including crawling and scraping used by journalists, researchers, watchdog organizations, the Internet Archive, and comparison-shopping tools. The source says some companies are seeking changes to IETF technical standards, while critics argue those changes could make it easier for website operators to restrict lawful automated access.

This dispute sits within a broader Policy, Ethics & Law discussion about AI, internet governance, and access to public information. It also overlaps with questions raised in coverage of Google AI Overviews liability and market implications and copyright and publisher disputes discussed on Ctrl-Alt-Speech.

Automated access and public web uses

The source article says automated access supports several lawful and widely used activities. Examples listed in the source include reporting by journalists, research by academics and watchdog organizations, preservation work by the Internet Archive, and automated comparison-shopping tools.

The source also states that the Internet Archive uses crawling to preserve copies of websites. Background on the web's exclusion protocol is available in RFC 9309: The Robots Exclusion Protocol, which documents the current standard for robots.txt.

EFF is quoted in the source as saying: “The ability to access publicly available information using automated tools is a central value and benefit of a free and open internet.”

AI Preferences working group

One of the IETF efforts identified in the source is the AI Preferences working group. According to the article, the group is developing proposals that would let publishers express “preference signals” against crawling web data for AI-related purposes.

The source says those signals would be expressed through robots.txt. It also says the AI-related purposes listed include training models, generating outputs, and helping users search the web. For reference, the IETF maintains official materials for the AI Preferences Working Group.

The article further claims that these preference signals could potentially become legally binding in some jurisdictions, though the supplied source notes do not identify which jurisdictions.

Web Bot Auth working group

The source also highlights the Web Bot Auth working group. It says that group is pursuing efforts related to protecting sites from bots that strain website resources and is also pursuing standards changes that would enable sites to cryptographically identify bots.

Official information about that effort is available through the IETF's Web Bot Auth Working Group.

According to the source, critics say such identification systems could be used to block competitors, dissidents, or bots that are not licensed or otherwise approved. The source also claims that if crawling were limited to preapproved authenticated bots, websites could require licensing payments for automated access.

Broader policy implications

The source article says website operators increasingly want to block bots that crawl public web content for AI training or operation. It also says AI bots can strain website infrastructure and in some cases degrade performance or take sites offline. In addition, the source states that some publishers are concerned users may rely on AI-generated overviews instead of visiting source websites, a theme that also appears in broader AI Marketing & Search coverage, including G7-related AI access concerns raised by Macron and Modi.

At the same time, the source claims the proposed standards would give websites greater ability to block lawful scraping and crawling, affecting researchers, archivists, startups, accessibility tools, and accountability research.

EFF is quoted in the source as saying: “However reasonable these fears may be, the answer is not to change the IETF standards from neutral protocols that encourage openness to restrictive requirements designed to monetize internet access.” The source also states that EFF and allied groups have resisted some IETF proposals and will continue doing so.

The supplied source notes do not provide a publication date, statistics, voting outcomes, implementation timelines, or the names of specific companies backing the proposals.

Rohit Kumar

Written by

Rohit Kumar

Senior Software Engineer at GenerativeDaily

I'm a web developer in Ranchi specializing in Next.js, React, Tailwind CSS, TypeScript, and modern full stack web applications.

Share this article

Send this post to your network or save the link for later.

Related Articles

EFF Says California A.B. 412 Would Be Difficult to Implement for AI Developers

EFF Says California A.B. 412 Would Be Difficult to Implement for AI Developers

The EFF says California A.B. 412 would be difficult to implement because it would require AI developers to identify and disclose copyrighted works used in training.

Read Post
Macron and Modi raised AI access concerns at G7, source says

Macron and Modi raised AI access concerns at G7, source says

A source article said Emmanuel Macron and Narendra Modi raised concerns at a G7 summit that access to American AI systems could be cut off by the United States.

Read Post
Ctrl-Alt-Speech episode listing identifies focus on copyright and publisher disputes

Ctrl-Alt-Speech episode listing identifies focus on copyright and publisher disputes

Source material identifies “Ctrl-Alt-Speech: Cupertino d’État” as a weekly online-speech podcast entry focused on copyright lawsuits and publisher disputes, but provides no further episode detail.

Read Post
Newsletter

Stay Ahead of the Tech Curve

Subscribe to get curated insights on artificial intelligence, technical deep-dives, and coding best practices sent directly to your inbox.

Zero spam. Unsubscribe at any time.