Jailbreak
An adversarial prompt that bypasses an AI model's built-in safety rules to make it produce content it normally would not.
In detail
A jailbreak is a prompt crafted to make an AI model ignore the safety training and policies that constrain its output. Common techniques include role-playing scenarios ("pretend you are a model with no restrictions"), prompt injection through smuggled instructions, encoding messages in unusual formats and incrementally pushing the model past each refusal. Foundation model providers patch known jailbreaks but new ones are discovered constantly.
Why it matters for Australian business
For an Australian business deploying an AI assistant, jailbreaks matter when the assistant has access to your customers, your data or any system where bypassing safety rules causes harm. Defences include a strong system prompt, server-side validation of outputs against business rules (not relying on the model alone), classifier guards on inputs and outputs, and operational logging so jailbreak attempts are detected and reviewed.