

Modern AI agents are only as useful as the actions they can take. Large language models are great at writing, but they cannot fetch emails, post to Slack, or schedule meetings on their own. Tool calling is the bridge between text generation and real work.
An LLM takes text in and produces text out. It does not directly call APIs or run SDK functions. Tool calling gives it this ability.
Think of a tool as a function the model can ask your system to execute. A tool might fetch your last 20 emails, send a message, or create a GitHub issue.
When the model decides it needs a tool, it sends a message describing which tool to call and what arguments to pass. Your application reads that message, runs the tool, and returns the output. That loop is how an LLM gets things done.
If you want to understand how tool calling works under the hood, read more here.
💡If you're new to LangChain: LangChain is a framework that helps connect LLMs to external data and tools. It handles tool binding, context management, and reasoning chains. Check it out here..
Here is a minimal Python example that binds a tool the model can use automatically:
python1 2 3 4 5 6 7 8 9 10 11from langchain_openai import ChatOpenAI from langchain_[core.tools](http://core.tools) import tool @tool def exponentiate(x: float, y: float) -> float: """Raise x to the power of y.""" return x ** y # Define LLM and bind tools llm = ChatOpenAI(model="gpt-4-turbo", temperature=0) llm_with_tools = llm.bind_tools([exponentiate])
Now the model can decide to call exponentiate whenever it needs it.

In Gaia's early days we bound every available integration directly to the model. It worked at small scale, then slowed down as we added Gmail, Calendar, Slack, GitHub, Linear, Notion, and more.
We also ran into a practical limit. OpenAI caps the number of tools per model, often around 128. Each tool includes a schema and description that must be added to the context window. More tools means more tokens, less room for reasoning, and higher latency and cost.
We hit that wall quickly.
Even before reaching hard limits, hundreds of tools create context pollution.
The model starts confusing which tool to call or fails to call any at all. The real question became:
How can we support thousands of tools without choking the LLM context window?
One idea was to list all tools in the prompt and add a search_tool.
We’d list all tools in the system prompt, then let the model call search_tool .
The search_tool would look up the right tool and bind it dynamically.
Sounds smart, right? Not really.
LLMs are too unpredictable. You can’t rely on them to use the exact tool names or follow strict patterns.
When you have hundreds or thousands of tools, this method collapses:
We actually tried this at one point — and it worked only for small sets (under ~30 tools).

Then came the turning point — LangGraph Big Tools.
It’s a small package that turns the problem of tool lookup from exact name matching to semantic retrieval.
How it works:
Each tool (its name + description) is embedded into a vector store like ChromaDB.
When the LLM needs a tool, it doesn’t guess names — it writes a natural language query, e.g.
“find a tool that fetches latest GitHub pull requests.”
A retrieve_tools step queries the vector store and finds the most relevant matches.
Those tools are then dynamically bound to the model at runtime.
Now, instead of carrying all tools in the prompt, the model retrieves them on demand.
This reduces context size, cost, and confusion dramatically.

After solving discovery, we needed a way to manage real-world integrations efficiently.
That’s where Composio came in — a platform that provides ready-made, authenticated tools for popular services like Slack, Gmail, GitHub, and more.
We built a ToolRegistry that:
langgraph_bigtools implementationOn top of that, Gaia runs with sub-agents — isolated agents for each integration (GitHub, Linear, Slack, etc.).
Each sub-agent has its own toolset and logic, managed through the ToolRegistry .

With this architecture Gaia handles thousands of tools efficiently. Each user can connect multiple accounts and integrations without bloating the LLM context. The system is scalable, modular, and context aware.
With retrieval based binding, an AI system can connect to a wide world of tools and still think clearly.
This is only half the story.
Gaia also uses complex sub-graphs and sub-agents that coordinate multiple integrations in parallel.
For example, one agent might handle your inbox, another your project management, and a third your calendar — all talking to each other through a higher-level control graph.
