Skip to content
Get Started. Free Consult
The State of Vibe-Coded App Security

45% of AI-generated code ships with vulnerabilities.

In the largest study of its kind, the Veracode 2025 GenAI Code Security Report found that 45% of AI-generated code samples contained security vulnerabilities, tested across more than 100 large language models and 80-plus coding tasks. The pattern held regardless of model size or recency. This brief collects the verified, citable research so you can see what the data actually says before you ship.

Key findings, at a glance
45%
of AI-generated code samples contained vulnerabilities
Veracode 2025
~2x
the baseline rate of secret leaks in AI-assisted commits
GitGuardian
86%
of AI cases failed to defend against cross-site scripting
Veracode
None
no improvement from newer or larger models
Veracode
01The data

What the research actually shows.

Across 100-plus models and 80-plus tasks, nearly half of all AI-generated code carried a security flaw.

The headline figure comes from the Veracode 2025 GenAI Code Security Report: 45% of AI-generated code samples contained security vulnerabilities, measured across more than 100 large language models and over 80 coding tasks (coverage via Help Net Security).

The failures were not evenly spread. AI models failed to defend against cross-site scripting in about 86% of cases and log injection in about 88% of cases, according to the same research.

The most uncomfortable finding is what did not change. Veracode reports that newer and larger models were not more secure than older ones, pointing to a structural problem in how AI generates code, not a limitation the next release will quietly fix. You cannot wait out this risk by upgrading to a bigger model.

Vulnerable versus clean samples
Veracode 2025
45%vulnerable
45% carried a vulnerability
across 100-plus models, 80-plus tasks
55% clean, but you cannot tell which is which without a review
Sample-level vulnerability rate, all models tested.
Where AI code fails most
Veracode 2025, share of cases failing to defend
Log injection, failed to defend88%
Cross-site scripting (XSS), failed to defend86%
Any vulnerability, all samples45%

The verified numbers

Sourced
  • !
    45% of AI-generated code. Contained security vulnerabilities across 100-plus models and 80-plus tasks (Veracode 2025).
  • !
    ~86% cross-site scripting. Share of cases where AI models failed to defend against XSS (Veracode).
  • !
    ~88% log injection. Share of cases where AI models failed to defend against log injection (Veracode).
  • !
    No size advantage. Newer and larger models were not more secure than older ones (Veracode).
02Secrets

Where the data leaks.

AI-assisted commits leaked secrets at roughly double the baseline rate.

It is not only vulnerable logic that ships. GitGuardian's State of Secrets Sprawl report found that AI-assisted commits leaked secrets at roughly double the baseline rate, about 3.2% compared with about 1.5% across public commits.

Hard-coded API keys, tokens and credentials are exactly the kind of thing an AI assistant will helpfully write inline when it is moving fast and you are not watching closely.

The risk compounds in codebases that already have problems. Snyk found that GitHub Copilot can replicate and amplify vulnerabilities that already exist in a codebase. Existing security debt makes AI-assisted output less secure, not more.

Secret-leak rate in commits
GitGuardian
1.5%
Baseline, all public commits
3.2%
AI-assisted commits
about 2x the rate of leaked secrets
Share of commits exposing at least one secret.

Two ways vibe-coded apps bleed

Sourced
  • !
    Leaked secrets, about 2x. AI-assisted commits leaked secrets at about 3.2% versus about 1.5% baseline (GitGuardian).
  • !
    Amplified existing flaws. Copilot can replicate and amplify vulnerabilities already in the codebase (Snyk).
Want the story behind the numbers? Read our breakdown of vibe-coding vulnerabilities.
03Why

Why AI-generated code is insecure.

A model generates code from patterns it has seen. It has no understanding of security.

A language model predicts the next plausible token based on the vast amount of public code it was trained on. Plenty of that public code is insecure, so the model reproduces insecure patterns confidently and fluently. It is optimising for code that looks right, not code that is safe. That is why the finding that bigger models do not help makes sense: scaling the same approach scales the same blind spot.

The human side matters just as much. Researchers at Stanford found that developers using AI assistants wrote less secure code, yet were more likely to believe their code was secure. That false confidence is the dangerous part. Fluent, well-formatted output reads as trustworthy, so the review step that would have caught the flaw gets skipped. The OWASP Top 10 for LLM Applications maps the categories worth checking for.

The root causes

Why
  • ?
    Pattern, not understanding. Models reproduce patterns from training data, including insecure ones, with no model of security.
  • ?
    Scale does not fix it. Newer and larger models were not more secure, which signals a structural cause (Veracode).
  • ?
    False confidence. Developers with AI assistants wrote less secure code yet thought it was more secure (Stanford).
  • ?
    A known taxonomy. OWASP maintains a Top 10 for LLM Applications as the reference framework for these risks.
04Implications

What it means for your business.

Vibe coding is genuinely useful. It just cannot be the last step before you ship.

None of this means you should stop building with AI. It means the output needs a security review before it reaches production, the same way you would review code from a fast junior developer who never went to a security class. If nearly half of AI-generated code carries a flaw and secrets leak at double the rate, the cost of skipping review is a breach, a leaked credential, or a customer-data incident you find out about the hard way.

The practical answer is a review gate. Get the code audited against a known framework, fix what the audit finds, and put a repeatable process around AI usage so the next sprint is safe by default.

What to do about it

Action
05FAQ

Frequently asked questions.

The Veracode 2025 GenAI Code Security Report found that 45% of AI-generated code samples contained security vulnerabilities, measured across more than 100 large language models and over 80 coding tasks, so it is not a one-model fluke. The failures concentrated in common categories: AI models failed to defend against cross-site scripting in about 86% of cases and log injection in about 88% of cases.

Vibe coding is safe as a development accelerator, but its raw output is not safe to ship without review. With nearly half of AI-generated code carrying a vulnerability (Veracode) and AI-assisted commits leaking secrets at roughly double the baseline rate, about 3.2% versus 1.5% (GitGuardian), the risk is real. Treat AI output like code from a fast but unsupervised junior developer: useful, but reviewed and hardened before production.

No. Veracode reports that newer and larger models were not more secure than older ones. That points to a structural problem in how AI generates code rather than a temporary limitation the next release will fix. You need a review and remediation process, not a bigger model.

Put a review gate between AI output and production. Run a scan or audit against a known framework such as the OWASP Top 10 for LLM Applications, fix what it finds, rotate any leaked secrets, and add a repeatable process so future AI-assisted work is safe by default. VibeZero offers a free Vibe Scan to start, a vibe code audit to find what shipped insecure, and a fix-and-harden engagement to remediate it.

06Sources

Every number, cited.

[1]
Veracode, 2025 GenAI Code Security Report
45% of AI-generated code samples contained vulnerabilities; about 86% XSS and about 88% log-injection failure rates; no security gain from larger models.
veracode.com
[2]
GitGuardian, State of Secrets Sprawl
AI-assisted commits leaked secrets at roughly double the baseline rate, about 3.2% versus about 1.5%.
gitguardian.com
[3]
Snyk, via InfoWorld
GitHub Copilot can replicate and amplify vulnerabilities that already exist in a codebase.
infoworld.com
[4]
Stanford, AI assistants and code security
Developers using AI assistants wrote less secure code yet were more likely to believe it was secure.
research
[5]
OWASP, Top 10 for LLM Applications
The industry reference framework for the categories of failure in LLM-generated software.
owasp.org
Methodology and honesty note. Figures are quoted from the primary research above and rounded as their authors reported them. We link every source so you can verify each number yourself. Where a figure is approximate (~) the source reported a range or rounded value. This brief is informational, not a security guarantee. A scan or audit of your specific app is the only way to know what is in it.

Shipped something built with AI? Find out what is in it.

Run a free Vibe ScanTalk to us about an audit

honest answers, no pitch deck, no commitment.