Does using a bigger AI model fix it?

No. Veracode reports that newer and larger models were not more secure than older ones. That points to a structural problem in how AI generates code rather than a temporary limitation the next release will fix. You need a review and remediation process, not a bigger model.

The State of Vibe-Coded App Security

45% of AI-generated code ships with vulnerabilities

In the largest study of its kind, the Veracode 2025 GenAI Code Security Report found that 45% of AI generated code samples contained security vulnerabilities, tested across more than 100 large language models and 80-plus coding tasks. The pattern held regardless of model size or recency. This brief collects the verified, citable research so you can see what the data actually says before you ship.

Run a free Vibe Scan Get a code audit

Published evidence · observed or measured

45%

of AI-generated code samples contained vulnerabilities

Veracode 2025

~2x

the baseline rate of secret leaks in AI-assisted commits

GitGuardian

86%

of AI cases failed to defend against cross-site scripting

Veracode

None

no improvement from newer or larger models

Veracode

01The data

What the research actually shows

Across 100-plus models and 80-plus tasks, nearly half of all AI generated code carried a security flaw.

The headline figure comes from the Veracode 2025 GenAI Code Security Report. It found that 45% of AI generated code samples contained security vulnerabilities, measured across more than 100 large language models and over 80 coding tasks (coverage via Help Net Security).

The failures were not evenly spread. AI models failed to defend against cross-site scripting in about 86% of cases and log injection in about 88% of cases, according to the same research.

The most uncomfortable finding is what did not change. Veracode reports that newer and larger models were not more secure than older ones, pointing to a structural problem in how AI generates code, not a limitation the next release will quietly fix. You cannot wait out this risk by upgrading to a bigger model.

Vulnerable versus clean samples

Veracode 2025

45%vulnerable

45% carried a vulnerability
across 100-plus models, 80-plus tasks

55% clean, but you cannot tell which is which without a review

Sample-level vulnerability rate, all models tested.

Where AI code fails most

Veracode 2025, share of cases failing to defend

Log injection, failed to defend88%

Cross-site scripting (XSS), failed to defend86%

Any vulnerability, all samples45%

The verified numbers

Sourced

45% of AI-generated code. Contained security vulnerabilities across 100-plus models and 80-plus tasks (Veracode 2025).
~86% cross-site scripting. Share of cases where AI models failed to defend against XSS (Veracode).
~88% log injection. Share of cases where AI models failed to defend against log injection (Veracode).
No size advantage. Newer and larger models were not more secure than older ones (Veracode).

02Secrets

Where the data leaks

AI assisted commits leaked secrets at roughly double the baseline rate.

It is not only vulnerable logic that ships. GitGuardian's State of Secrets Sprawl report found that AI assisted commits leaked secrets at roughly double the baseline rate, about 3.2% compared with about 1.5% across public commits.

Hardcoded API keys, tokens and credentials are exactly the kind of thing an AI assistant will helpfully write inline when it is moving fast and you are not watching closely.

The risk compounds in codebases that already have problems. Snyk found that GitHub Copilot can replicate and amplify vulnerabilities that already exist in a codebase. Existing security debt makes AI assisted output less secure, not more.

Secret-leak rate in commits

GitGuardian

1.5%

Baseline, all public commits

3.2%

AI-assisted commits

about 2x the rate of leaked secrets

Share of commits exposing at least one secret.

Two ways vibe-coded apps bleed

Sourced

Leaked secrets, about 2x. AI-assisted commits leaked secrets at about 3.2% versus about 1.5% baseline (GitGuardian).

Want the story behind the numbers? Read our breakdown of vibe-coding vulnerabilities.

03Why

Why AI generated code is insecure

A model generates code from patterns it has seen. It has no understanding of security.

A language model predicts the next plausible token based on the vast amount of public code it was trained on. Plenty of that public code is insecure, so the model reproduces insecure patterns confidently and fluently. It is optimising for code that looks right, not code that is safe. That is why the finding that bigger models do not help makes sense. Scaling the same approach scales the same blind spot.

The human side matters just as much. Researchers at Stanford found that developers using AI assistants wrote less secure code, yet were more likely to believe their code was secure. That false confidence is the dangerous part. Fluent, well formatted output reads as trustworthy, so the review step that would have caught the flaw gets skipped. The OWASP Top 10 for LLM Applications maps the categories worth checking for.

The root causes

Why

Pattern, not understanding. Models reproduce patterns from training data, including insecure ones, with no model of security.
Scale does not fix it. Newer and larger models were not more secure, which signals a structural cause (Veracode).
False confidence. Developers with AI assistants wrote less secure code yet thought it was more secure (Stanford).
A known taxonomy. OWASP maintains a Top 10 for LLM Applications as the reference framework for these risks.

04Implications

What it means for your business

Vibe coding is genuinely useful. It just cannot be the last step before you ship.

None of this means you should stop building with AI. It means the output needs a security review before it reaches production, the same way you would review code from a fast junior developer who never went to a security class. If nearly half of AI generated code carries a flaw and secrets leak at double the rate, the cost of skipping review is a breach, a leaked credential, or a customer data incident you find out about the hard way.

The practical answer is a review gate. Get the code audited against a known framework, fix what the audit finds, and put a repeatable process around AI usage so the next sprint is safe by default.

What to do about it

Action

Review before you ship. Run a free Vibe Scan
Audit what already shipped. Get a vibe code audit
Fix and harden the app. Fix my AI app
Lock down the process. AI security
Build safely next time. Build with AI

05FAQ

Frequently asked questions

The Veracode 2025 GenAI Code Security Report found that 45% of AI generated code samples contained security vulnerabilities, measured across more than 100 large language models and over 80 coding tasks, so it is not a one-model fluke. The failures concentrated in common categories. AI models failed to defend against cross-site scripting in about 86% of cases and log injection in about 88% of cases.

Vibe coding is safe as a development accelerator, but its raw output is not safe to ship without review. With nearly half of AI generated code carrying a vulnerability (Veracode) and AI assisted commits leaking secrets at roughly double the baseline rate, about 3.2% versus 1.5% (GitGuardian), the risk is real. Treat AI output like code from a fast but unsupervised junior developer, useful, but reviewed and hardened before production.

Put a review gate between AI output and production. Run a scan or audit against a known framework such as the OWASP Top 10 for LLM Applications, fix what it finds, rotate any leaked secrets, and add a repeatable process so future AI assisted work is safe by default. VibeZero offers a free Vibe Scan to start, a vibe code audit to find what shipped insecure, and a fix and harden engagement to remediate it.

06Sources

Every number, cited

[1]

Veracode, 2025 GenAI Code Security Report

45% of AI-generated code samples contained vulnerabilities; about 86% XSS and about 88% log-injection failure rates; no security gain from larger models.

veracode.com

[2]

GitGuardian, State of Secrets Sprawl

AI-assisted commits leaked secrets at roughly double the baseline rate, about 3.2% versus about 1.5%.

gitguardian.com

[3]

Snyk, via InfoWorld

GitHub Copilot can replicate and amplify vulnerabilities that already exist in a codebase.

infoworld.com

[4]

Stanford, AI assistants and code security

Developers using AI assistants wrote less secure code yet were more likely to believe it was secure.

research

[5]

OWASP, Top 10 for LLM Applications

The industry reference framework for the categories of failure in LLM-generated software.

owasp.org

Methodology and honesty note. Figures are quoted from the primary research above and rounded as their authors reported them. We link every source so you can verify each number yourself. Where a figure is approximate (~) the source reported a range or rounded value. This brief is informational, not a security guarantee. A scan or audit of your specific app is the only way to know what is in it.

Act on the evidence

Shipped something built with AI? Find out what is in it

Start with a fast scan, then put a proportionate review gate between AI output and production.

Run a free Vibe Scan Talk to us about an audit

Honest answers · no pitch deck · no commitment

Start with the problem, not the product.

Plan & risk

Automate

Set up & train

Check & fix

Build an app

Built around how your business actually works.

Resources & built environment

Professional & regulated

People, property & community

Field & logistics

Useful thinking, templates and tools.

Thinking

Templates & guides

Free tools

45% of AI-generated code ships with vulnerabilities

What the research actually shows

Where the data leaks

Why AI generated code is insecure

What it means for your business

Frequently asked questions

Every number, cited

Shipped something built with AI? Find out what is in it

Services

Company & resources

Free tools

Platforms

Industries

What the research actually shows

Where the data leaks

Why AI generated code is insecure

What it means for your business

Frequently asked questions

How much AI-generated code has security vulnerabilities?

Is vibe coding safe?

Does using a bigger AI model fix it?

How do I make a vibe-coded app secure?

Every number, cited

Shipped something built with AI? Find out what is in it