How Much of the Web Is AI-Generated? The Data

By mid-2025, roughly 35% of newly published websites were AI-generated or AI-assisted — up from essentially zero before ChatGPT launched in late 2022. That number comes from researchers at Stanford, Imperial College London, and the Internet Archive, who analyzed 33 months of the web through the Wayback Machine. So the "dead internet" jokes have a real number behind them now. But the more interesting finding is the one that didn't make the headlines: the data does not support the thing everyone believes about AI making the web look the same.

If you've read What Is AI Design Slop?, you know the visual tells are real. This piece is about separating what the data proves from what we just assume — because the honest version is more useful than the hot take, and a lot harder to dismiss.

What the data confirms

Three findings hold up, and they're the ones worth repeating:

Semantic contraction. AI-generated pages were 33% more semantically similar to each other than human-written ones. Content is converging on the same meanings — the same phrasings, the same framings, the same safe middle.
Artificial positivity. AI content showed 107% higher positive sentiment. Everything's "exciting" and "seamless" and "powerful." The web is being relentlessly upsold by machines that don't know how to be neutral.
Less diversity than a search box. A separate 2025 study across 27 language models, 155 topics, and 12 countries found every model produced output less epistemically diverse than a basic web search. Ask an AI and you see a narrower slice of the world than if you'd just searched.

Stack those up and the "homogenization" worry is partly earned. The web is getting more samey in what it says and how upbeat it says it.

What the data does NOT confirm (the surprise)

Here's the twist that should make you trust the rest. The same Stanford-led study found no statistically significant increase in stylistic homogeneity tied to AI prevalence — even though 83% of people surveyed believed AI was flattening everyone into one generic voice. That was the single most widely-held belief they tested, and the data didn't back it.

Read that carefully: the thing nearly everyone is sure about — "AI is making all writing sound identical" — is the thing the evidence couldn't confirm. Meaning converges; style, measurably, doesn't (yet).

So when someone says "everything online looks and sounds the same now," the accurate response is: some of it does, in specific measurable ways, and one of the loudest claims about it isn't actually supported. That distinction is the whole point. It's also why the visual-design version of this — the purple-gradient sameness — has to be argued from mechanism and example, not from "a study proved it." Because no study proved the visual monoculture. The defaults did.

The cost that's measurable, not debatable

If you want the part of AI slop that isn't an opinion, it's accessibility. Per WebAIM's 2025 analysis of the top million homepages, 96%+ have detectable WCAG failures, and low-contrast text is the most common one — on 80%+ of sites. AI tools default straight into that failure: the low-contrast CTA is practically a signature. This isn't taste.

It's the most common real defect on the web, now shipped faster and at higher volume.

And there's a trust gap underneath the speed. Figma's 2025 AI report (2,500 users) found 78% say AI boosts their efficiency, but only 32% trust its output. People can feel the difference between fast and good, even when they can't name it.

Why 35% is the number that actually matters

There's a reason the AI-prevalence figure is more than a "dead internet" punchline. The foundational research on model collapse (Nature, 2024) showed that when models train on AI-generated data, they degrade — losing the rare, distinctive cases first. At a few percent of the web, that's an academic worry. At 35% and climbing, the next generation of models is training on a web that's increasingly its own output.

The feedback loop stops being theoretical.

That's the real stakes of slop: not that one purple landing page is ugly, but that a web averaging itself becomes the training data for the thing that averages it next.

What to actually do about it

You can't fix the macro trend. You can refuse to add to it, and the move is the same one this whole series keeps landing on: generate fast, then verify against something real before you ship. For design specifically, that means measuring the rendered page — contrast, hierarchy, the slop patterns — instead of trusting that it "looks fine." That's what Pixelslop does. The macro web will do what it does; your corner of it doesn't have to be part of the 35%.

Frequently Asked Questions

How much of the internet is AI-generated?

By mid-2025, roughly 35% of newly published websites were AI-generated or AI-assisted, up from essentially zero before ChatGPT's late-2022 launch, according to a Stanford, Imperial College London, and Internet Archive study analyzing 33 months of Wayback Machine data with the Pangram detector.

Is AI really making the whole web look and sound the same?

Partly, and not in the way most people think. The data confirms AI content is more semantically similar (33% higher) and more relentlessly positive (107% higher sentiment), and that models are less diverse than a web search. But the same research found no statistically significant rise in stylistic homogeneity — even though 83% of people believed it existed. Meaning converges; writing style, measurably, hasn't.

What's the actual harm of AI slop, then?

The clearest, most measurable harm is accessibility: low-contrast text is the most common WCAG failure on the web (80%+ of homepages), and AI defaults ship it constantly. The longer-term risk is model collapse — at 35% AI prevalence, new models increasingly train on AI output, degrading diversity over time.

Part of a series on AI design slop: What Is AI Design Slop? · Why Every AI-Built Website Is Purple · How to Tell If a Website Was AI-Generated

How Much of the Web Is AI-Generated? What the Data Actually Says

What the data confirms

What the data does NOT confirm (the surprise)

The cost that's measurable, not debatable

Why 35% is the number that actually matters

What to actually do about it

Frequently Asked Questions

How much of the internet is AI-generated?

Is AI really making the whole web look and sound the same?

What's the actual harm of AI slop, then?

Found This Useful?

More Learning Material

How to Tell If a Website Was AI-Generated: A Field Guide

Why Every AI-Built Website Is Purple (and Looks the Same)

What Is AI Design Slop? (And Why Every AI-Built Site Looks the Same)