Token
The unit of text that language models process, roughly corresponding to a word or a few characters.
In detail
Language models do not process characters or words directly. They split text into tokens using a tokeniser, which breaks text into sub-word units. In English a token is roughly three to four characters or three-quarters of a word on average, though common words like 'the' are a single token and long uncommon words may be several tokens. Pricing for API calls to hosted models is denominated in tokens (per million tokens input and output). The context window is also measured in tokens. Code, structured data and non-English languages typically tokenise less efficiently than plain English prose.
Why it matters for Australian business
Understanding tokens matters for Australian businesses using AI APIs for budget management, for designing prompts that fit within context windows, and for understanding why the same task costs more or less with different models or different input formats. A customer support assistant processing long email threads, PDF attachments and conversation history can hit context limits quickly. We size infrastructure and select models based on the token economics of the specific workload.