•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

Researchers analyzing data from the Internet Archive found that up to one-third of websites created since 2022 were AI-generated or AI-assisted. The study, conducted by experts from Stanford, Imperial College London, and the Internet Archive, was published in the paper “The Impact of AI-Generated Text on the Internet.” It also suggests that AI-generated text is making the web more engaging while reducing verbosity.
The researchers, drawing inspiration from the “Dead Internet Theory,” which argues that much of the internet may be driven by bots, set out to assess how tools such as ChatGPT and its competitors have influenced the web since 2022.
They tested six common criticisms of AI-generated text: whether it narrows perspectives, increases misinformation through hallucinations, makes online writing feel overly sanitized and cheerful, fails to cite sources, lowers semantic density, and pushes writing toward a monotonous cultural voice that erodes unique voices.
To evaluate these claims, the team defined measurable signals for each hypothesis, calculated them monthly for each sampled website, and examined correlations with an AI-likelihood score.
From August 2022 to May 2025, the researchers sampled websites and retrieved the oldest archived screenshots for each URL using the Wayback Machine API. They downloaded and stored the raw HTML for subsequent processing.
The team used Pangram v3, an AI-detection tool, to identify AI-generated sites. After testing multiple tools, Pangram v3 produced the highest detection rate. When Pangram v3 flagged a site as AI-generated, the researchers used it as the basis for testing the remaining six hypotheses.
Only two of the six theories appeared to be supported by the data.
At the same time, the researchers reported no observed rise in verifiable misinformation or in source omission.
Significantly, the study did not find an increase in verifiable false claims. The researchers noted that AI may instead be increasing the volume of unverifiable statements that current truth-checking tools cannot assess—or that the internet may not have been a consistently truth-keeping environment.
Maty Bohacek, a Stanford research assistant and co-author, said the team is working with the Internet Archive to turn the approach into a tool that continuously produces these signals rather than providing only a single snapshot. The researchers also plan to expand on which types of websites are most affected by category or language.
For Doležal, studies like this are intended to help preserve an internet that remains useful. As AI-generated content spreads, the challenge is to define a role for these models that avoids producing a sanitized and repetitive web. Rather than requiring full compliance, allowing models to retain distinctive personalities could enable them to function as creative partners rather than replacing human voice.
Premium gym chains are entering a “golden era” that is ending or already in decline, as rising operating costs collide with shifting consumer preferences toward more flexible, community-based ways to exercise. Long-term memberships are shrinking, margins are pressured by higher rents and facility expenses, and competition from smaller, more personalized…