Skip to main content

Glossary

Context Window

The context window is the maximum amount of text — measured in tokens — that an LLM can process in a single request, including both the prompt you send and the response it generates. Claude 3's context window is up to 200K tokens; GPT-4 Turbo supports 128K.

The context window determines how much history, documentation, or data a model can "see" at once. Longer contexts enable richer RAG pipelines and multi-document synthesis but also increase latency and cost. Context window limitations drove the development of chunking strategies and vector-search retrieval systems as ways to work around early limits.