Glossary · AI & Development

Jailbreak

An adversarial prompt that bypasses an AI model's built-in safety rules to make it produce content it normally would not.

In detail

A jailbreak is a prompt crafted to make an AI model ignore the safety training and policies that constrain its output. Common techniques include role-playing scenarios ("pretend you are a model with no restrictions"), prompt injection through smuggled instructions, encoding messages in unusual formats and incrementally pushing the model past each refusal. Foundation model providers patch known jailbreaks but new ones are discovered constantly.

Why it matters for Australian business

For an Australian business deploying an AI assistant, jailbreaks matter when the assistant has access to your customers, your data or any system where bypassing safety rules causes harm. Defences include a strong system prompt, server-side validation of outputs against business rules (not relying on the model alone), classifier guards on inputs and outputs, and operational logging so jailbreak attempts are detected and reviewed.

How we help with this

Service

← All glossary terms

BUILD

FIX & SECURE

CONSULT & TRAIN

Jailbreak

In detail

Why it matters for Australian business

How we help with this

AI App Security

AI Agents

AI Consulting

Related terms

Want to talk through how this applies to your business? Book a free consult.