Skip to content
Get Started. Free Consult
VibeZero/Resources/Glossary/Context Window
Glossary · AI & Development

Context Window

The maximum amount of text (measured in tokens) a language model can process in a single request, covering both input and output.

In detail

The context window is the total number of tokens a language model can attend to at once: the prompt, any conversation history, retrieved documents (in a RAG setup), tool call results and the model's generated response all count against this limit. Older models had context windows of 4K or 8K tokens. Current models range from 128K to one million tokens or more. A larger context window lets you send more documents, longer conversations and richer tool outputs without chunking, but larger contexts cost more and can cause the model to lose focus on information in the middle of a very long context.

Why it matters for Australian business

For Australian businesses building AI systems that process long documents (contracts, medical records, lengthy email threads), the context window determines what architecture you need. A system that fits in one context call is simpler and cheaper than one that needs chunking, summarisation and multi-step retrieval. Choosing a model with the right context size for your workload is part of AI infrastructure design, and we factor it into every AI implementation engagement.

How we help with this

Related terms

← All glossary terms

Want to talk through how this applies to your business? Book a free consult