45% of AI-generated code ships with vulnerabilities.
In the largest study of its kind, the Veracode 2025 GenAI Code Security Report found that 45% of AI-generated code samples contained security vulnerabilities, tested across more than 100 large language models and 80-plus coding tasks. The pattern held regardless of model size or recency. This brief collects the verified, citable research so you can see what the data actually says before you ship.
What the research actually shows.
Across 100-plus models and 80-plus tasks, nearly half of all AI-generated code carried a security flaw.
The headline figure comes from the Veracode 2025 GenAI Code Security Report: 45% of AI-generated code samples contained security vulnerabilities, measured across more than 100 large language models and over 80 coding tasks (coverage via Help Net Security).
The failures were not evenly spread. AI models failed to defend against cross-site scripting in about 86% of cases and log injection in about 88% of cases, according to the same research.
The most uncomfortable finding is what did not change. Veracode reports that newer and larger models were not more secure than older ones, pointing to a structural problem in how AI generates code, not a limitation the next release will quietly fix. You cannot wait out this risk by upgrading to a bigger model.
Vulnerable versus clean samples
Veracode 2025across 100-plus models, 80-plus tasks
Where AI code fails most
Veracode 2025, share of cases failing to defendThe verified numbers
Sourced- !45% of AI-generated code. Contained security vulnerabilities across 100-plus models and 80-plus tasks (Veracode 2025).
- !~86% cross-site scripting. Share of cases where AI models failed to defend against XSS (Veracode).
- !~88% log injection. Share of cases where AI models failed to defend against log injection (Veracode).
- !No size advantage. Newer and larger models were not more secure than older ones (Veracode).
Where the data leaks.
AI-assisted commits leaked secrets at roughly double the baseline rate.
It is not only vulnerable logic that ships. GitGuardian's State of Secrets Sprawl report found that AI-assisted commits leaked secrets at roughly double the baseline rate, about 3.2% compared with about 1.5% across public commits.
Hard-coded API keys, tokens and credentials are exactly the kind of thing an AI assistant will helpfully write inline when it is moving fast and you are not watching closely.
The risk compounds in codebases that already have problems. Snyk found that GitHub Copilot can replicate and amplify vulnerabilities that already exist in a codebase. Existing security debt makes AI-assisted output less secure, not more.
Secret-leak rate in commits
GitGuardianTwo ways vibe-coded apps bleed
Sourced- !Leaked secrets, about 2x. AI-assisted commits leaked secrets at about 3.2% versus about 1.5% baseline (GitGuardian).
- !Amplified existing flaws. Copilot can replicate and amplify vulnerabilities already in the codebase (Snyk).
Why AI-generated code is insecure.
A model generates code from patterns it has seen. It has no understanding of security.
A language model predicts the next plausible token based on the vast amount of public code it was trained on. Plenty of that public code is insecure, so the model reproduces insecure patterns confidently and fluently. It is optimising for code that looks right, not code that is safe. That is why the finding that bigger models do not help makes sense: scaling the same approach scales the same blind spot.
The human side matters just as much. Researchers at Stanford found that developers using AI assistants wrote less secure code, yet were more likely to believe their code was secure. That false confidence is the dangerous part. Fluent, well-formatted output reads as trustworthy, so the review step that would have caught the flaw gets skipped. The OWASP Top 10 for LLM Applications maps the categories worth checking for.
The root causes
Why- ?Pattern, not understanding. Models reproduce patterns from training data, including insecure ones, with no model of security.
- ?Scale does not fix it. Newer and larger models were not more secure, which signals a structural cause (Veracode).
- ?False confidence. Developers with AI assistants wrote less secure code yet thought it was more secure (Stanford).
- ?A known taxonomy. OWASP maintains a Top 10 for LLM Applications as the reference framework for these risks.
What it means for your business.
Vibe coding is genuinely useful. It just cannot be the last step before you ship.
None of this means you should stop building with AI. It means the output needs a security review before it reaches production, the same way you would review code from a fast junior developer who never went to a security class. If nearly half of AI-generated code carries a flaw and secrets leak at double the rate, the cost of skipping review is a breach, a leaked credential, or a customer-data incident you find out about the hard way.
The practical answer is a review gate. Get the code audited against a known framework, fix what the audit finds, and put a repeatable process around AI usage so the next sprint is safe by default.
What to do about it
Action- ✓Review before you ship. Run a free Vibe Scan
- ✓Audit what already shipped. Get a vibe code audit
- ✓Fix and harden the app. Fix my AI app
- ✓Lock down the process. AI security
- ✓Build safely next time. Build with AI
Frequently asked questions.
The Veracode 2025 GenAI Code Security Report found that 45% of AI-generated code samples contained security vulnerabilities, measured across more than 100 large language models and over 80 coding tasks, so it is not a one-model fluke. The failures concentrated in common categories: AI models failed to defend against cross-site scripting in about 86% of cases and log injection in about 88% of cases.
Vibe coding is safe as a development accelerator, but its raw output is not safe to ship without review. With nearly half of AI-generated code carrying a vulnerability (Veracode) and AI-assisted commits leaking secrets at roughly double the baseline rate, about 3.2% versus 1.5% (GitGuardian), the risk is real. Treat AI output like code from a fast but unsupervised junior developer: useful, but reviewed and hardened before production.
No. Veracode reports that newer and larger models were not more secure than older ones. That points to a structural problem in how AI generates code rather than a temporary limitation the next release will fix. You need a review and remediation process, not a bigger model.
Put a review gate between AI output and production. Run a scan or audit against a known framework such as the OWASP Top 10 for LLM Applications, fix what it finds, rotate any leaked secrets, and add a repeatable process so future AI-assisted work is safe by default. VibeZero offers a free Vibe Scan to start, a vibe code audit to find what shipped insecure, and a fix-and-harden engagement to remediate it.