AI-Generated Code Hallucinations: A Growing Threat to Software Supply Chains

As artificial intelligence continues to revolutionize software development, a startling new study has uncovered a critical vulnerability in AI-generated code that could pose a significant risk to global software infrastructure. Researchers have found that code written by large language models (LLMs) often contains fictional or nonexistent references to third-party libraries — a phenomenon that dramatically increases the likelihood of software supply chain attacks. These made-up libraries, referred to as “package hallucinations,” offer an enticing attack vector for hackers seeking to introduce malicious code into otherwise trusted applications.

Hallucinated Dependencies: A Trojan Horse in Your Codebase

The research, presented in a soon-to-be-released paper at the 2025 USENIX Security Symposium, demonstrates that many of today’s most advanced AI coding assistants are generating dependencies that do not exist. The researchers tested 16 of the most widely used LLMs, producing a staggering 576,000 code samples across two major programming languages: Python and JavaScript.

The results were alarming.

Of the 2.23 million package references found within the generated code, approximately 440,000 — nearly 20 percent — were hallucinated. These hallucinated dependencies don’t point to legitimate libraries available on package registries like PyPI or npm. Instead, they reference names that the AI essentially “made up” during generation. If unsuspecting developers attempt to install these fake packages, it creates a golden opportunity for malicious actors to exploit the system by uploading malware under those same names.

This isn’t just a theoretical risk — it’s a real and growing threat to the modern software development pipeline.

Understanding the Supply Chain Attack Vector

To appreciate the severity of this issue, one must first understand how software supply chain attacks work. In the context of modern development, a “dependency” is any external code module that a piece of software relies on to function. Instead of writing every feature from scratch, developers often use open-source libraries to handle tasks like authentication, file manipulation, networking, or machine learning.

This reliance on third-party libraries has led to a complex web of interconnected codebases. While this ecosystem has vastly improved software efficiency and innovation, it also introduces substantial risk. If a dependency is compromised — whether intentionally through malware or inadvertently through poor coding practices — every piece of software that uses it is potentially at risk.

Package hallucinations exacerbate this vulnerability. By generating references to nonexistent packages, LLMs may inadvertently direct developers to install malicious software — especially if an attacker spots the hallucinated name, creates a real package using that name, and uploads it to a public repository. Once that happens, any developer who installs the package based on the AI’s suggestion is unwittingly opening their systems to malicious code.

The Rise of Dependency Confusion

This kind of attack isn’t entirely new. In fact, a related method called “dependency confusion” was first demonstrated in 2021 in a proof-of-concept attack that successfully injected counterfeit code into internal networks belonging to Apple, Microsoft, and Tesla. The attack worked by uploading malicious packages to public registries using the same names as internal private libraries but with higher version numbers. Because many build systems automatically pull the most recent version of a dependency, they mistakenly used the attacker’s version.

Package hallucination now offers a fresh twist on this technique. Instead of hijacking legitimate internal names, attackers can simply wait for LLMs to suggest fictional packages, then upload malware under those names and wait for developers to take the bait.

Joseph Spracklen, lead researcher and PhD student at the University of Texas at San Antonio, explained the mechanism in an email to Ars Technica:

“Once the attacker publishes a package under the hallucinated name, containing some malicious code, they rely on the model suggesting that name to unsuspecting users. If a user trusts the LLM’s output and installs the package without carefully verifying it, the attacker’s payload, hidden in the malicious package, would be executed on the user’s system.”

In this way, the attacker doesn’t even need to break into a system — the developer does the hard work for them.

How AI “Hallucinates” Code

In AI research, the term “hallucination” refers to instances where an LLM generates output that is incorrect, irrelevant, or fabricated. Hallucinations in natural language output are already well-documented. However, hallucinations in code — especially in the form of made-up dependencies — are particularly dangerous because code tends to be treated as more deterministic and less prone to “creative” error.

Unfortunately, the study shows that code hallucination is not only common but alarmingly consistent.

According to the data, 43% of hallucinated packages were repeated across more than 10 separate queries. Even more striking: 58% of the time, a hallucinated package was repeated multiple times in 10 iterations. This persistence indicates that hallucinations are not just random one-off glitches — they follow predictable patterns. For attackers, this means they can identify commonly hallucinated names and register those packages in bulk to maximize their potential reach.

Which Models and Languages Are Most Vulnerable?

Not all LLMs performed equally in the study. Open-source models, such as Meta’s CodeLlama and DeepSeek, had significantly higher hallucination rates than commercial models like OpenAI’s GPT series or Google’s Gemini.

On average:

Open-source models hallucinated nearly 22% of their dependencies.
Commercial models were much more accurate, with only about 5% hallucination rates.

The discrepancy likely stems from the sheer size and scope of the commercial models. According to the researchers, commercial LLMs often have 10 times more parameters than their open-source counterparts, allowing them to recall more accurate information and reduce errors. Furthermore, companies like OpenAI and Anthropic use additional fine-tuning, safety layers, and instruction training that improve output reliability — something that open-source models currently lack due to resource constraints.

Another key finding: hallucination rates also varied by programming language.

Python code generated by LLMs exhibited an average hallucination rate of 16%.
JavaScript code was worse, with hallucinations averaging over 21%.

The likely explanation? The JavaScript ecosystem has a larger, more complex package registry — with more than ten times the number of packages as Python’s PyPI. This complexity leads to greater uncertainty within the models, resulting in higher hallucination rates.

The Implications for Developers and the Future of Software Security

The implications of this study are profound. As Microsoft CTO Kevin Scott predicts that 95% of all code will be AI-generated within the next five years, developers may increasingly rely on AI assistance to scaffold, write, and refactor code. But if AI tools are hallucinating dependencies — and doing so in consistent, exploitable ways — the security of every application that uses them could be at risk.

Here are a few important takeaways for developers and organizations:

Don’t blindly trust AI-suggested packages
Always verify whether a package actually exists in official repositories. Cross-check the URL, documentation, and author metadata before installing.
Enable strict dependency validation
Tools like pip-audit, npm audit, or dependency scanning via CI/CD pipelines can catch suspicious packages before they make it into production.
Lock your dependencies
Use lock files (package-lock.json, Pipfile.lock, etc.) to pin exact package versions and avoid automatically pulling in newer or malicious alternatives.
Monitor public registries for suspicious uploads
Organizations should watch for package names that closely resemble internal or frequently used libraries, especially if they appear suddenly and lack documentation.
Push for LLM safety improvements
The open-source AI community must prioritize improvements in hallucination detection, training data quality, and fine-tuning to mitigate security risks in generated code.

A New Chapter in Software Supply Chain Risk

This study reinforces the idea that LLMs — while powerful and transformative — are not infallible. In fact, their very usefulness and scale may increase the threat surface for software security. As attackers adapt to the AI era, they’re learning to exploit not just flaws in software code, but flaws in the very tools we use to create it.

For AI-generated code to become safe and trustworthy, developers must remain vigilant. Trust, but verify — especially when the suggestions come from a machine.