Search the capability, not the manual

Loading the whole manual does not scale

There is a naive way to give an agent access to an API, and almost everyone tries it first. You take the OpenAPI specification, you load it into the context window, and you let the model read the manual before it acts. It works for one small API. It falls apart the moment you have a real platform.

Here is the arithmetic that breaks it. A public media analytics platform we work with runs eleven API connections. Their catalogued operations total roughly eighteen hundred: the CRM and fundraising system alone contributes well over five hundred operations, and that is one connection of eleven. Loaded as full schemas with request and response shapes, that surface runs into the millions of tokens before a single byte of actual data is retrieved. You cannot put that in a context window. Even if you could afford to, you should not, because the agent would spend its attention reading documentation instead of solving the problem, and most of what it read would be irrelevant to the task at hand.

The instinct to load everything up front is the same instinct that fails with tools generally. More is not better. The agent does not need to know every operation. It needs to find the one operation that does the thing it is trying to do, at the moment it is trying to do it.

A small tool footprint over an unbounded surface

Plexara's answer is to keep the agent's actual toolset small and fixed, regardless of how many APIs sit behind it. A handful of stable, general tools cover the entire integration surface: list the configured connections; list the sections of a given API; search the operations of an API for a capability; fetch the schema for one specific operation, on demand; invoke an operation; and stream a large response to storage instead of through the model.

That is the whole footprint. It does not grow when you add a connection. Adding the orchestration engine, or the events platform, or a second CRM, does not add tools to the agent's working set. It adds rows to a registry that the same six tools already know how to traverse. The agent's cognitive load stays flat while the platform's reach expands without limit. This is the deliberate inverse of the tool-sprawl approach, where every new system bolts another handful of bespoke tools onto an ever-heavier agent.

Semantic discovery: find the endpoint by intent

A small toolset only works if the search tool is good, because search is now how the agent navigates everything. This is where semantic discovery earns its place.

When the agent needs a capability, it does not scroll a list of operation names hoping to recognize one. It describes what it wants and the platform ranks the operations by relevance. The ranking can be lexical for exact term matching, semantic for matching by intent when the agent's phrasing does not share vocabulary with the spec author's, or a hybrid of both. "Find the flows that are running" can surface the right orchestration endpoint even when the operation is named something the agent would never have guessed, because the match is on meaning rather than on string overlap.

The pattern in practice is search, then narrow, then act. The agent searches the connection for the handful of operations that fit its intent. It reads the schema for the one it chooses, and only that one. Then it invokes. The full specification for the other seventeen hundred operations never enters the context window, because the agent never needed it. It needed three operations for this task and it found them by searching.

This is the same principle the broader platform applies to its own tools. Rather than exposing every capability at once, tools are loaded on demand by relevance. The agent searches the capability it needs and the matching tools arrive. Endpoint discovery is that same idea pushed down into each connected API. Whether the agent is finding a platform tool or finding an operation inside a thousand-operation CRM, the move is identical: search for the capability, retrieve only what matches, leave the rest on disk.

Why this is the architecture that scales

The contrast is clearest when you imagine adding a twelfth API, then a twentieth. Under the load-everything model, each addition makes every task more expensive, because the documentation the agent wades through grows with the platform whether or not the new API is relevant to the question. Cost and latency climb, and the agent's accuracy degrades as the signal it needs is buried under specs it does not. The architecture punishes growth.

Under search-on-demand, the twentieth API costs the same per task as the second. The registry is larger, but the agent still searches, still retrieves a handful of candidates, still reads one schema, still acts. Context consumption is governed by the complexity of the task, not by the size of the platform. The architecture is indifferent to growth, which is exactly the property you want when the entire premise is connecting an agent to deep and wide infrastructure.

There is a quieter benefit that matters to anyone responsible for spend and reliability. An agent that reads only the operations it uses is an agent whose behavior is legible. You can see which connection it searched, which operation it chose, which schema it pulled. The reasoning path is narrow and inspectable, instead of a model swimming through a megabyte of documentation and arriving somewhere you cannot reconstruct. Small footprint is not only cheaper. It is more auditable.

What this means for the decision

The token economics of agent platforms are usually treated as an implementation detail. They are not. They are an architectural fork that determines whether your platform gets cheaper or more expensive to use as it grows. A platform that front-loads documentation has costs that scale with its own size. A platform built on a small tool footprint and semantic discovery has costs that scale with the task. Over a roadmap that adds connections every quarter, those two curves diverge sharply.

The right question to ask a vendor, or your own team, is simple. When you add the next ten integrations, does every existing workflow get more expensive, or does nothing change but the size of a registry the agent already knows how to search. The answer tells you whether you are buying something that compounds in your favor or against you.

That is the through-line of all three pieces. Context over raw tooling, because meaning is what makes answers true. Combination across layers, because diagnosis lives in the joins. And discovery over memorization, because a platform meant to see deep and wide has to stay light enough to actually do it.

Search the capability, not the manual: how Plexara keeps a wide platform light

Loading the whole manual does not scale

A small tool footprint over an unbounded surface

Semantic discovery: find the endpoint by intent

Why this is the architecture that scales

What this means for the decision

Related reading

103 - Context, compression, and memory

201 - Anatomy of a Plexara MCP

Five kinds of memory, and how each comes back

Cookie Preferences