Researchers Just Found the Neurons Behind AI Hallucinations — Here’s What It Means for Your Business

If you’ve used AI tools in your business — for content, customer support, research, or internal workflows — you’ve probably encountered the problem firsthand. The model confidently gives you an answer that turns out to be completely wrong. A made-up statistic. A non-existent product feature. A reference to a case study that never happened.

This is what the AI world calls a hallucination, and it’s one of the biggest reasons businesses remain cautious about deploying AI at scale.

Now, a new piece of research from Tsinghua University has done something genuinely interesting: instead of looking at hallucinations from a big-picture perspective — training data, model design, feedback processes — the researchers went microscopic. They looked inside the neural network itself and asked: which specific neurons are actually responsible for hallucinations?

The paper, “H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs“, published in December 2025, is one of the most detailed explorations yet of hallucination at the computational level. And for businesses working with AI, the findings are worth understanding — even if you’re not a machine learning engineer.

Why Hallucinations Have Been So Hard to Fix

Before getting into the research findings, it’s worth quickly recapping why hallucinations are such a persistent problem.

Most AI language models are trained on enormous amounts of text, and they learn to predict what words or sentences should come next. This makes them very good at producing fluent, coherent-sounding output. But fluency and factual accuracy are not the same thing.

Previous research has pointed to several contributing factors: gaps and biases in training data, training methods that reward confident answers rather than honest uncertainty, and the way AI generates text one token at a time, which means a small error early on can snowball into a bigger one. Models like GPT-3.5 have been shown to hallucinate in roughly 40% of citation-based fact checks, and even GPT-4 comes in around 28.6%, according to a comparative analysis published in the Journal of Medical Internet Research.

What’s been largely missing from the conversation is a neuron-level explanation. What’s actually happening inside the model when it hallucinates? That’s the gap this new research tries to fill.

What the Research Found: Meet the H-Neurons

The Tsinghua team identified what they call H-Neurons — a very small subset of neurons inside an AI model’s feedforward networks that are strongly associated with hallucinations.

Here’s what makes this finding striking: these neurons make up less than 0.1% of all the neurons in a model. In some of the larger models tested, the proportion was as low as 0.01%. Despite being such a tiny fraction of the overall network, this group of neurons carries enough signal to reliably predict whether the model is about to produce a hallucinated response or a factually accurate one.

The researchers tested this across six different AI models — including models from the Mistral, Gemma, and Llama families — and found that classifiers built on these H-Neurons consistently outperformed classifiers built on randomly selected neurons, often by more than 10 percentage points in accuracy.

Even more significantly, the neurons identified from one type of question (general trivia) continued to predict hallucinations when tested on completely different datasets — including biomedical questions (BioASQ) and questions about entirely made-up entities that couldn’t possibly exist in any training data. That kind of generalization suggests these neurons aren’t just picking up on a quirk in one dataset. They seem to capture something fundamental about how hallucinations form.

It’s Not Just About Getting Facts Wrong

Perhaps the most counterintuitive part of the research is what happens when the researchers deliberately manipulated these neurons.

They ran a series of controlled experiments where they artificially amplified or suppressed H-Neuron activity during inference — while the model was generating a response — without retraining the model at all. And what they found was revealing.

When H-Neurons were amplified, models didn’t just become more likely to hallucinate facts. They became more compliant across the board. Specifically:

They were more likely to accept and answer questions built on false premises, rather than correcting them.
They were more likely to agree with deliberately misleading information provided in a prompt.
They were more likely to change a previously correct answer when a user pushed back — even when the original answer was right.
They were more likely to comply with harmful or inappropriate instructions.

Conversely, suppressing these neurons made the models more resistant across all four of those dimensions.

What this suggests is that hallucinations aren’t an isolated “fact-checking failure.” They appear to be one symptom of a broader tendency the researchers call over-compliance: the model’s inclination to give people what they seem to want, even when doing so means sacrificing accuracy, safety, or integrity. In other words, the model behaves a little like a people-pleaser — prioritizing the appearance of a helpful answer over the reality of whether that answer is true.

Where Do These Neurons Come From?

A natural follow-up question is: when do these neurons develop? Are they introduced during fine-tuning — the stage where a base model is refined with human feedback and instruction-following data — or are they already there in the underlying pre-trained model?

The research team’s answer is fairly clear: H-Neurons are present before fine-tuning begins.

When they took classifiers trained on instruction-tuned models and applied them directly to the original base models, the classifiers still worked. The neurons still predicted hallucinations, even in models that had never been fine-tuned. And when they looked at how much these specific neurons changed during the fine-tuning process, they found that H-Neurons tend to remain relatively stable — they show less parameter change than the average neuron in the network.

This tells us something important about where the hallucination problem actually sits. It isn’t primarily introduced by the alignment or instruction-tuning process. It appears to be baked in from the start, during the base pre-training phase, as a consequence of training a model to predict the next token in a sequence without any mechanism to distinguish between factually correct and incorrect continuations.

This finding aligns with theoretical work on why hallucinations may be an inherent feature of how language models learn, rather than an implementation bug that can simply be patched away. A paper titled “Why Language Models Hallucinate”by researchers including Adam Kalai and Santosh Vempala argues from a learning-theory perspective that hallucinations are in some sense inevitable given the structure of next-token prediction training. The H-Neuron research provides empirical, neuron-level evidence that supports this view.

What This Means for Businesses Using AI

So what should you actually take away from this if you’re a business leader, marketing manager, or digital transformation lead thinking about AI deployment?

1. Hallucinations Are Structural, Not Just Accidental

One common misconception is that hallucinations are a bug that will eventually be engineered away entirely as models get bigger and better. This research suggests the reality is more complicated. These behaviours appear to be rooted in the fundamental architecture of how models are trained, not in a surface-level flaw that can be patched. That doesn’t mean progress isn’t being made — it is — but it does mean that building AI systems for your business should involve a realistic view of where these risks live, not an assumption that they’ll disappear on their own.

2. Compliance and Hallucination Are Two Sides of the Same Coin

The link between over-compliance and hallucination is particularly relevant for business applications. If your AI system is configured to be maximally helpful and agreeable — which is often the default for customer-facing or assistant tools — it may also be more susceptible to both hallucinating and to going along with user inputs that contain errors or bad assumptions. Understanding this trade-off matters when you’re designing how your AI tools interact with customers, staff, or data.

3. Detection Is Getting More Precise

On the more positive side, this research opens a path toward much better hallucination detection — not just at the output level, but at the generation level. If you can identify which neurons are active during a response, you may be able to flag high-risk outputs in real time, before they reach an end user. This kind of signal could eventually be built into AI deployment infrastructure as a quality control layer, which is genuinely useful for any business where factual accuracy matters.

4. Fine-Tuning Alone Won’t Solve This

Many organisations ask whether they can fine-tune their way out of hallucination problems — training a model on their own data and use cases to make it more reliable. The evidence here suggests fine-tuning has limits. If the underlying neurons driving hallucinations are largely unchanged by the fine-tuning process, custom training alone is unlikely to eliminate the risk. It may still improve performance meaningfully, but it needs to be paired with thoughtful system design, output validation, and human review processes.

The Bigger Picture

Research like this represents a genuine shift in how the AI field approaches reliability. For years, the conversation about hallucinations has focused on data quality, model scale, and alignment processes. This work adds a new layer: the ability to look inside the model and identify specific computational structures tied to specific failure modes.

That’s significant not just for researchers but for the broader ecosystem of tools, platforms, and applications built on top of large language models. As this kind of interpretability research matures, it will likely feed into better monitoring tools, smarter deployment guidelines, and more nuanced conversations about where AI can and can’t be trusted to operate autonomously.

At Axient.ai, we think this kind of research matters for anyone making real decisions about AI implementation. The organisations that will get the most value from AI aren’t necessarily the ones that move fastest — they’re the ones that understand what these systems actually do, where they’re reliable, and where they need oversight. Findings like these give us a clearer and more honest picture of both.

Source: Gao et al. (2025). H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs. arXiv:2512.01797.

Why Hallucinations Have Been So Hard to Fix

What the Research Found: Meet the H-Neurons

It’s Not Just About Getting Facts Wrong

Where Do These Neurons Come From?

What This Means for Businesses Using AI

The Bigger Picture

Ready to Transform Your Business with AI?

Related Articles

AI Can Theoretically Automate Millions of Jobs. Why Hasn’t It Shown Up in the Data Yet?

Let’s Talk AI Solutions