Irony Alert: Hallucinated Citations Found in Papers from NeurIPS – When AI Experts Get Fooled by AI

Irony Alert: Hallucinated Citations Found in Papers from NeurIPS – When AI Experts Get Fooled by AI

Estimated reading time: 8 minutes

  • Around 100 hallucinated citations were discovered across roughly 51 to 53 papers at NeurIPS 2025.
  • GPTZero scanned all 4,841 accepted papers and found fabricated references in approximately 1.1% of papers.
  • The fake citations included total fabrications, Frankenstein combinations of real papers, and lazy placeholder names.
  • This scandal reveals the dangers of relying on AI tools without verification and fact-checking.
  • For startups and founders, this serves as a critical lesson: accuracy is currency in fundraising and business.
  • The incident is prompting conferences like ICLR to implement AI detection tools during the review process.

Imagine the world’s best chefs getting caught using plastic fruit in their prize-winning pies. Or imagine a driving instructor crashing their car because they were texting. It sounds ridiculous, right? Well, something just as wild has happened in the world of technology.

Irony alert: Hallucinated citations found in papers from NeurIPS, the prestigious AI conference.

This isn’t just a small typo. This is a massive story that has the tech world buzzing with shock, laughter, and a little bit of worry. The very people who are building the smartest Artificial Intelligence (AI) systems in the world—the experts who warn us that AI can make mistakes—have been caught making those exact mistakes themselves.

In this post, we are going to dig deep into this scandal. We will look at what happened, how they got caught, and why it matters to everyone, especially if you are a startup founder trying to navigate the world of data and fundraising.

First, let’s set the stage. To understand why this is such a big deal, you need to know what NeurIPS is.

NeurIPS stands for “Neural Information Processing Systems.” That is a very fancy name, but think of it this way: it is the Super Bowl, the Olympics, and the Oscars of the Artificial Intelligence world all rolled into one. It is the place where the smartest minds from Google, Facebook (Meta), OpenAI, and top universities gather to show off their latest inventions.

Getting a paper accepted at NeurIPS is incredibly hard. It is like winning a gold medal. It can get you a job at a top tech company or millions of dollars in funding for your lab.

So, you would expect the work presented there to be perfect, right? You would expect every fact to be checked and every source to be real.

Well, think again.

A startup called GPTZero, which builds tools to detect AI-written text, decided to do a little detective work. They scanned all 4,841 papers accepted to the NeurIPS 2025 conference. What they found was shocking.

They discovered around 100 hallucinated or fabricated citations across roughly 51 to 53 papers.

Let that sink in for a second. These papers had already passed “peer review.” That means other experts looked at them and said, “Yes, this is good science.” But hidden inside the bibliographies (the list of books and papers they read) were ghosts. Papers that do not exist. Authors who aren’t real.

You might be wondering, “What does it mean for a citation to be hallucinated?”

When we talk about humans, hallucinating means seeing things that aren’t there. When we talk about AI, it means something similar. Large Language Models (LLMs)—like the ones that power ChatGPT—are basically very advanced autocomplete engines. They guess which word comes next.

Sometimes, they guess wrong.

If you ask an AI to “write a list of sources for a paper on robot vision,” it might try to be helpful. It knows what a citation looks like. It knows the names of famous researchers. So, it mixes them together.

According to the reports, the fake citations found at NeurIPS fell into three funny, but worrying, categories:

  1. The Total Fake: The AI made up a paper title, invented authors, and created a link that goes nowhere. It looks real, but it is pure fiction.
  2. The Frankenstein: This is when the AI takes a real paper but changes the details. Maybe it adds an author who didn’t work on it. Maybe it changes the year. It stitches together parts of real things to make a fake thing.
  3. The “John Doe”: Some were just lazy. They listed authors like “John Smith” or “Jane Doe” as placeholders and forgot to fix them.

These are the kinds of mistakes you might expect from a student rushing to finish homework five minutes before class. You do not expect them from the world’s elite scientists.

How did they find these needles in the haystack?

GPTZero is usually known for checking if an essay was written by a computer. But for this project, they did something different. They didn’t just check the writing style; they checked the facts.

Their tool scanned the PDF of every paper. It pulled out every single reference in the bibliography. Then, it cross-checked those references against real databases—the places where all real scientific papers are stored.

It was a simple question: Does this paper exist? Yes or No.

If the database said “No,” or if the details didn’t match, the system flagged it.

GPTZero claims their tool is over 99% accurate for this kind of work. To be absolutely sure, a human researcher on their team manually checked every single flagged citation. They wanted to make sure they weren’t accusing top scientists of cheating without proof.

And the proof was there. Over 100 fake sources found.

Now, we need to be fair. NeurIPS is huge.

  • Total papers scanned: 4,841
  • Papers with fakes: About 51 to 53.

That is roughly 1.1% of the accepted papers.

You might say, “Hey, 1% isn’t that bad!” And in some things, you would be right. If you got 99% on a math test, you would be thrilled.

But science is different. In science, trust is everything. A research paper is like a brick in a wall. If one brick is fake, the wall gets weaker. Citations are the “audit trail” of science. They show where you got your information. If you make up your sources, you are breaking the rules of the game.

Also, consider how competitive this conference is. There were 21,575 submissions to NeurIPS 2025. Only about 24.5% got in.

That means for every paper that got accepted, three or four were rejected. Imagine being a researcher who did everything right, checked every source, and worked for months, only to get rejected—while someone else got in with a paper full of made-up book titles. That stings.

Why would smart people do this?

The answer tells us a lot about how we use AI today.

Researchers are busy. They are under huge pressure to publish as much as possible. This is often called “publish or perish.” If you don’t publish new papers, you might lose your funding or your job.

Writing the bibliography is the most boring part of writing a paper. You have to find the exact title, the volume number, the page numbers, and the year. It is tedious work.

So, it seems many researchers took a shortcut. they probably pasted their paper into ChatGPT or a similar tool and said, “Hey, generate a bibliography for me.”

The AI, wanting to please the user, did exactly that. It generated a list that looked perfect. And because the researchers were busy (or maybe a little lazy), they didn’t check. They just copied, pasted, and hit “submit.”

It is a classic case of trusting the machine too much.

This problem is spreading like a virus. It isn’t just happening at NeurIPS.

GPTZero also looked at another major conference called ICLR (International Conference on Learning Representations). They found around 50 hallucinated citations in papers submitted there, too.

The good news is that ICLR decided to fight back. They actually hired GPTZero to run these checks during the review process. They are making “AI hygiene” a part of their rules.

This shows that the industry is waking up. They are realizing that just because an AI is smart, it doesn’t mean it is always truthful.

You might be reading this and thinking, “I’m not a scientist. I’m building a startup. Why do I care about fake bibliographies?”

This story is a massive lesson for anyone in business, especially for founders raising capital.

At HeyEveryone.io, we think about this all the time. Our business is built on using AI to help founders reach investors. But there is a right way to use AI, and a wrong way.

The researchers at NeurIPS used AI the wrong way: they used it to invent facts rather than process them.

When you are fundraising, accuracy is your currency. If you send an email to an investor claiming you have a partnership that you don’t have, or citing market data that doesn’t exist, you are done. Your reputation is destroyed instantly. Investors talk. If you “hallucinate” your traction or your market size, you won’t just get a rejection; you will get blacklisted.

This is why the approach we take at HeyEveryone is so different from just asking ChatGPT to “write a cold email.”

1. Verification vs. Creation:
The mistake the researchers made was asking AI to create data (citations). At HeyEveryone, we use AI to find and analyze existing real-world data. We scan vast datasets to find investors who actually exist and have a history of investing in your specific sector. We don’t guess; we verify.

2. The Human Element:
The NeurIPS scandal happened because humans stopped looking. They let the autopilot fly the plane into the mountain. In fundraising, you cannot be on autopilot. Our system drafts highly personalized emails based on real news mentions and social activity, but the founder is still the pilot. The AI provides the superpower, not the hallucination.

3. Efficiency without Laziness:
There is a difference between being efficient and being sloppy. Using AI to scan thousands of investors in seconds is efficient. Using AI to make up a fake reason why they should like you is sloppy.

The NeurIPS story proves that even the smartest people can fall into the trap of “lazy AI.” It serves as a warning: AI is a tool, not a magic wand.

So, what did the conference organizers say when they were caught?

NeurIPS released a statement acknowledging the findings. They said that while they are piloting new policies, the presence of a fake citation doesn’t automatically mean the science in the paper is wrong.

Their argument is: “Okay, the author messed up the reference list, but the math and the experiments in the paper might still be correct.”

That is true, in theory. If I bake a delicious cake but write the wrong brand of flour on the recipe card, the cake is still delicious. But would you trust me to bake your wedding cake if I made up ingredients that don’t exist? Probably not.

It creates a “boy who cried wolf” situation. If I can’t trust your citations, how can I trust your data? Did you make up the test results too? Did you hallucinate the improvement in your algorithm?

This scandal highlights a few major problems that go beyond just one conference:

  • Wasting Time: Imagine being a student trying to learn about AI. You read a top paper, see a citation, and try to find it. You spend hours searching, asking librarians, and digging through archives. But you can’t find it because it doesn’t exist. It is a massive waste of human potential.
  • Polluting the Record: If one paper cites a fake paper, and then another paper cites the first paper, the fake information spreads. It becomes like a rumor that everyone believes is true.
  • The Reputation of AI: This is the most ironic part. These scientists are the ones telling the world, “We need to be careful with AI.” They are the ones telling governments how to regulate AI. But if they can’t even police their own homework, it looks bad. It looks like the watchdogs are asleep.

This story is going to change how science is done.

We are going to see a lot more “AI Police” tools like GPTZero being used. Just like schools use software to check for plagiarism, conferences will use software to check for hallucinations.

Reviewers—the people who read the papers before they are published—will have to be more careful. They can’t just skim the bibliography anymore. They have to check the links.

For the rest of us, it is a reminder to always double-check. whether you are a student writing an essay, a journalist writing an article, or a founder writing a pitch deck.

The headline “Irony alert: Hallucinated citations found in papers from NeurIPS” will be remembered as a turning point. It is the moment we realized that being an expert in AI doesn’t make you immune to the tricks of AI.

It is funny, yes. But it is also a serious wake-up call.

At HeyEveryone, we believe that the future belongs to those who use AI to amplify the truth, not invent it. We help founders cut through the noise by using data that is real, relevant, and verified. We save you time—months of it—without cutting corners on accuracy.

Because in the end, whether you are trying to get a paper published at NeurIPS or trying to raise a seed round for your startup, credibility is everything. Don’t let a hallucination ruin yours.

The researchers at NeurIPS learned this lesson the hard way. Let’s make sure we learn from their mistake. AI is an incredible co-pilot, but never let it fly the plane while you take a nap.

What is a hallucinated citation?
A hallucinated citation is a reference to a paper, book, or author that does not actually exist. AI tools sometimes generate citations that look real but are completely fabricated or contain incorrect details mixed with real information.

How did GPTZero discover the fake citations at NeurIPS?
GPTZero scanned all 4,841 accepted papers at NeurIPS 2025, extracted every citation from the bibliographies, and cross-checked them against legitimate academic databases. Any citation that couldn’t be verified was flagged and then manually reviewed by human researchers.

Does a fake citation mean the entire research paper is invalid?
Not necessarily. A hallucinated citation in the bibliography doesn’t automatically invalidate the core findings or experiments in a paper. However, it does raise serious questions about the rigor and trustworthiness of the work and damages the credibility of the authors.

Why would AI researchers use AI to generate their citations?
Creating bibliographies is tedious and time-consuming work. Under pressure to publish quickly (the “publish or perish” culture), some researchers may have used AI tools like ChatGPT to speed up the process, trusting the output without properly verifying it.

What lessons can startup founders learn from this scandal?
For founders, especially those fundraising, this highlights the critical importance of accuracy and verification. Just as researchers shouldn’t fabricate citations, founders shouldn’t exaggerate metrics or claim partnerships that don’t exist. In both science and business, credibility is currency, and hallucinations—whether from AI or intentional misrepresentation—can destroy trust instantly.

Research for this article was sourced from The Outpost and Yahoo News.

hallucinated-citations-neurips-ai