Context Window
The context window is the maximum number of tokens a language model can process in a single inference call, encompassing the system prompt, conversation history, retrieved documents, and generated output.
理解する Context Window
The context window defines the working memory of a language model. Everything the model knows about the current task, including instructions, conversation history, retrieved documents, and tool outputs, must fit within this window. Content outside the window is effectively invisible to the model during that inference. Context windows have grown dramatically. Early GPT models had 4,096-token limits. Modern models support 128,000 (GPT-4o), 200,000 (Claude 3.5), and even 1,000,000+ tokens (Gemini 1.5 Pro). These expanded windows allow entire codebases, books, or long conversation histories to fit in a single context. Despite this growth, context windows still have practical limits. Processing a full context window is more expensive and slower than a shorter context. Research also shows that LLM attention can degrade for content in the middle of very long contexts, a phenomenon called 'lost in the middle.' Retrieval strategies that select the most relevant content outperform naive approaches that include everything. For AI agents like GAIA, managing the context window is an engineering challenge. Each tool call consumes tokens for its input and output. Long conversation histories accumulate. Retrieved documents add bulk. Effective context management, through summarization, selective retrieval, and conversation compression, is essential for reliable agent performance.
GAIAの活用方法 Context Window
GAIA actively manages context windows to maintain reliable agent performance. It uses selective RAG retrieval to include only the most relevant context, summarizes long conversation histories to compress older content, and chunks large documents before processing. This careful context management allows GAIA to handle complex multi-step workflows without hitting token limits or degrading reasoning quality.
関連概念
Tokenization
Tokenization is the process of breaking text into smaller units called tokens, which serve as the basic input units for language models. Tokens typically represent word fragments, whole words, or punctuation.
Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) is a technique that enhances LLM responses by first retrieving relevant documents or data from an external knowledge base and injecting that context into the model's prompt.
Large Language Model (LLM)
A Large Language Model (LLM) is a deep learning model trained on massive text datasets that can understand, generate, and reason about human language across a wide range of tasks.
大規模言語モデル(LLM)
大規模言語モデル(LLM)は、膨大なテキストデータでトレーニングされた人工知能モデルであり、人間のような流暢さで言語を理解、生成、推論できます。


