Context Window
The maximum amount of text (measured in tokens) a language model can process in a single request, covering both input and output.
In detail
The context window is the total number of tokens a language model can attend to at once: the prompt, any conversation history, retrieved documents (in a RAG setup), tool call results and the model's generated response all count against this limit. Older models had context windows of 4K or 8K tokens. Current models range from 128K to one million tokens or more. A larger context window lets you send more documents, longer conversations and richer tool outputs without chunking, but larger contexts cost more and can cause the model to lose focus on information in the middle of a very long context.
Why it matters for Australian business
For Australian businesses building AI systems that process long documents (contracts, medical records, lengthy email threads), the context window determines what architecture you need. A system that fits in one context call is simpler and cheaper than one that needs chunking, summarisation and multi-step retrieval. Choosing a model with the right context size for your workload is part of AI infrastructure design, and we factor it into every AI implementation engagement.