AI is Drowning Peer Review

0
15

Last summer, Peter Degen got a call from his postdoctoral supervisor. The news was weird. One of his older papers was getting cited too much.

Published in 2017, it looked at statistical accuracy in epidemiological data. Over the years it had racked up a few dozen citations. Respectable. Boring, even. But suddenly it was being referenced every few weeks. Hundreds of times. It shot up to the top of the list.

Degen dug into it. The citing papers had a pattern. They were all scraping the Global Burden of Disease dataset from the University of Washington. Same data. Different trivial questions. Will people in China fall down more? Do guys who eat zero whole grains get colon cancer? What’s the risk of stroke for anyone over twenty?

Endless variations on a theme.

Degen followed the code trail to a company on the Chinese social platform Bilibili. They were selling tutorials on how to crank out publishable papers in under two hours using AI tools. The output wasn’t great. A subset of studies on headaches was rife with errors. But they weren’t stupidly wrong. Not like the early days of AI slop where hallucinations screamed “fake.”

“It’s a huge burden on the peer review system,” Degen said.

He wasn’t joking. There aren’t enough reviewers. If Large Language Models keep mass-producing papers, the whole thing snaps.

The Gray Area

Paper mills have been around for a decade. Black-market operations selling authorship to desperate doctors or academics wanting a CV boost. Publishers and science sleuths played whack-a-mole. Close one hole. The mills dug another.

Early generative AI helped them dodge plagiarism checks by spinning text and faking images. But those fakes were sloppy. Rats with absurdly large genitals labeled “testtomcels.” Prose left with “as an AI assistant.” Easy to spot.

Now the tools have matured.

AI can write a convincing paper almost on autopilot. Desperate researchers don’t even need a paper mill. They can just run their own. The result is deluge.

Matt Spick at the University of Surrey noticed the shift. He got three papers analyzing the US NHANES dataset. They were strikingly similar. He checked Google Scholar. An explosion. All of them mining public data for tiny correlations. Does eating walnuts help your brain? Does skim milk cause depression?

Spick called it scientific spam.

“If you have enough computing power, you measure every pairwise association,” he said. “Eventually you find something unexplored and you publish it.”

Correlations that mean nothing. One study linked years of education to postoperative hernias.

What am I supposed to do? Drop out of school to avoid hernias? It’s noise.

Sleuths used to look for “tortured phrases” like “reinforcement getting to know.” Nonsense born of synonym-spinners. Now that doesn’t work. Spick looks for template papers mining the same databases. But the templates are gone.

The Tools Get Smarter

Journals started banning public dataset submissions last year. They’re drowning.

But the flood is changing shape. AI agents now exist that can analyze data, form hypotheses, and write the paper with high autonomy. OpenAI announced a tool called Prism. It promised to revolutionize science the way AI did for coding in 2025.

Spick tested it. He gave Prism data from a published paper on eggplant and pepper ripening.

Prism analyzed it. Proposed a new stat method. Wrote a full paper with charts and real citations.

It took twenty-five minutes.

Spick and his colleagues stared at the screen. “This is actually decent.” It didn’t follow the usual mold. It wasn’t a flagrant hallucination.

How do we filter this out?

Does it matter who wrote the paper if the facts are right?

“Science is supposed to be a filter,” Spick said. “We publish the interesting stuff. Not literally everything.”

If we publish everything, we’re just spamming the world with data. Half of it might not be knowledge. Just noise. And in a year or two, who can tell the difference?

The System Is Breaking

Marit Moe-Pryce edits Security Dialogue. Submissions are up 100%.

Worse: They’re all good now. No more left-over prompts. No obvious errors. The text is coherent, structured, polished. Is it a bot? A young scholar? An expert? Hard to say.

She calls it a gray mass.

“The fraudulent side and the academic side are conflating.”

She has to wade through the grayness to find the real work. She caught a fake citation recently. It listed former editors writing about a topic they never touched. Plausible. Dead on. She only caught it by cross-referencing.

She had to check if the cited papers were actually relevant to experts. AI cites real papers now. It just might not pick the ones an insider would use.

It’s detailed work. Manual labor. And there is so much of it.

Reviewers are free. Volunteering. They are tired. Moe-Pryce sends out dozens of requests and gets nothing. Or maybe two replies from twenty tries.

David Resnik at Accountability in Research sees a 60% spike in submissions. He’s besieged by papers about fraudulent papers. Irony aside.

He’s also guessing if the peer reviews he gets are AI-written. A survey said over half of researchers use AI to review.

The volume is rising. The pool of PhDs capable of reviewing it isn’t.

Quantitative Science Studies showed exponential growth in published papers. Not because science is advancing faster. Because the incentive structure rewards quantity.

Open access journals charge fees. They have no reason to limit volume. AI provides infinite supply. The human side is running out of time.