Conspiracy theory is becoming reality: up to 35% of new websites are AI-generated.

•

Researchers analyzing data from the Internet Archive found that up to one-third of websites created since 2022 were AI-generated or AI-assisted. The study, conducted by experts from Stanford, Imperial College London, and the Internet Archive, was published in the paper “The Impact of AI-Generated Text on the Internet.” It also suggests that AI-generated text is making the web more engaging while reducing verbosity.

How the study was conducted

The researchers, drawing inspiration from the “Dead Internet Theory,” which argues that much of the internet may be driven by bots, set out to assess how tools such as ChatGPT and its competitors have influenced the web since 2022.

They tested six common criticisms of AI-generated text: whether it narrows perspectives, increases misinformation through hallucinations, makes online writing feel overly sanitized and cheerful, fails to cite sources, lowers semantic density, and pushes writing toward a monotonous cultural voice that erodes unique voices.

To evaluate these claims, the team defined measurable signals for each hypothesis, calculated them monthly for each sampled website, and examined correlations with an AI-likelihood score.

Data collection window and method

From August 2022 to May 2025, the researchers sampled websites and retrieved the oldest archived screenshots for each URL using the Wayback Machine API. They downloaded and stored the raw HTML for subsequent processing.

AI detection and hypothesis testing

The team used Pangram v3, an AI-detection tool, to identify AI-generated sites. After testing multiple tools, Pangram v3 produced the highest detection rate. When Pangram v3 flagged a site as AI-generated, the researchers used it as the basis for testing the remaining six hypotheses.

Key findings

Only two of the six theories appeared to be supported by the data.

AI-generated text tends to reduce semantic diversity.
AI-generated text is generally more constructive.

At the same time, the researchers reported no observed rise in verifiable misinformation or in source omission.

Misinformation and unverifiable claims

Significantly, the study did not find an increase in verifiable false claims. The researchers noted that AI may instead be increasing the volume of unverifiable statements that current truth-checking tools cannot assess—or that the internet may not have been a consistently truth-keeping environment.

Next steps and potential applications

Maty Bohacek, a Stanford research assistant and co-author, said the team is working with the Internet Archive to turn the approach into a tool that continuously produces these signals rather than providing only a single snapshot. The researchers also plan to expand on which types of websites are most affected by category or language.

Broader implications

For Doležal, studies like this are intended to help preserve an internet that remains useful. As AI-generated content spreads, the challenge is to define a role for these models that avoids producing a sanitized and repetitive web. Rather than requiring full compliance, allowing models to retain distinctive personalities could enable them to function as creative partners rather than replacing human voice.