The context window is the total amount of text, measured in tokens, that a language model can hold in working memory during a single interaction. Everything the model reads and responds to in one session, including the system prompt, the conversation history, any documents retrieved via RAG, and the user's current message, must fit within this limit. Content outside the context window is invisible to the model.
Context window sizes vary significantly across models. Earlier models were limited to a few thousand tokens (roughly a few pages of text). Current models from Anthropic, OpenAI, and Google support context windows ranging from 128,000 to over one million tokens, enabling them to process book-length documents or lengthy conversation histories in a single call.
The context window affects how AI systems are designed and what they can do. Systems that need to reason over large documents, long conversation histories, or multiple retrieved sources simultaneously benefit from larger context windows. But size alone is not the only factor: models can exhibit what researchers call "lost in the middle" behavior, where content placed in the middle of a very long context is attended to less reliably than content at the beginning or end.
For organizations building AI products, context window management is a practical engineering concern. How documents are chunked, how conversation history is summarized or truncated, and how retrieved content is ranked and ordered before being placed in the context all affect the quality of the model's responses.