Token efficiency in enterprise MCP deployments

Quantifying token waste in typical MCP architectures

A typical enterprise MCP deployment exposes dozens to hundreds of tools. Each tool carries a description, parameter schema, and usage instructions. These tool definitions are sent to the LLM on every request as part of the context window. A deployment with 80 tools can consume 15,000 to 25,000 tokens in tool definitions alone, before the user has asked a question.

Beyond tool definitions, agents make redundant metadata fetches. When an agent queries a table, it receives raw results. To understand the results, it calls the catalog for column descriptions. Then ownership. Then quality scores. Then deprecation status. Each call consumes tokens for the request, the response, and the agent reasoning about what to do next. A single "describe this table" workflow can consume 3,000 to 5,000 tokens across multiple round trips.

Session-level waste compounds the problem. If an agent queries the same table twice in a conversation, it may re-fetch the same metadata. If it queries a related table, it may re-fetch overlapping context. Without session awareness, every interaction starts cold.

Where the tokens go

15–25K

Tokens consumed by tool definitions alone before the user asks anything

Deployment with 80 tools

3–5K

Tokens per describe-table workflow in a naive multi-call architecture

Four-plus round trips

40–60%

Per-session token reduction with cross-enrichment, filtering, and dedup

vs naive MCP deployment

380K

Cumulative tokens saved per 20-exchange session by persona filtering alone

25K → 6K per request

The compounding effect is the real story. A deployment that looks fine at one exchange bleeds at twenty.

Cross-enrichment consolidation

The first efficiency mechanism is cross-enrichment: enriching every tool response with context from complementary services. When an agent describes a table through Trino, the response includes DataHub metadata automatically. One enriched response does the work of five or more separate tool calls.

The token savings are direct. Four tool calls averaging 800 tokens each (request + response + reasoning) consume 3,200 tokens. One enriched response consumes 1,200 tokens. The savings scale linearly with the number of tables and datasets an agent interacts with in a session.

Cross-enrichment also eliminates the agent reasoning overhead between calls. Without enrichment, the agent must decide what additional context to fetch, formulate the request, interpret the response, and decide if more context is needed. With enrichment, the agent receives complete context in a single response and proceeds directly to answering the question.

Naive vs. enriched token budget

Naive multi-tool MCP

Per-session budget: 65,000 tokens

Tool definitions25,000 tokens: 80 tools, every request
Round-trip chatter16,000 tokens: 4× describe/ownership/quality
Redundant re-fetches9,000 tokens: No session awareness
Work tokens15,000 tokens: Actual reasoning + answer

Plexara enriched MCP

Per-session budget: 26,000 tokens

Tool definitions6,000 tokens: Filtered to analyst persona
Enriched single calls5,000 tokens: One response, full context
Dedup savings0 tokens: Suppressed repeats
Work tokens15,000 tokens: Same answer, same model

Tool visibility filtering

The second mechanism is tool visibility filtering. Persona-based access control determines which tools an agent can see, not just which tools it can call. An analyst persona might see 15 query and catalog tools. The same deployment might expose 60 tools total, but the analyst agent never receives the other 45 tool definitions.

This reduces the tool definition overhead from 25,000 tokens to 6,000 tokens on every request. Over a session with 20 exchanges, the cumulative savings reach 380,000 tokens. At typical API pricing, this translates directly to reduced cost per session.

Visibility filtering also improves agent accuracy. An agent with 15 relevant tools makes better tool selection decisions than one parsing 60 tool descriptions. Fewer options means less reasoning overhead and fewer incorrect tool selections that waste tokens on failed or irrelevant calls.

Three mechanisms

M01

At the response

Cross-enrichment

Every tool response is enriched with context from complementary services. Four calls collapse to one.

3,200 → 1,200 tokens per describe flow

M02

At the schema

Visibility filtering

Persona-based access control determines which tools an agent can see, not just call. Unused tool descriptions never enter the context window.

25,000 → 6,000 tokens per request

M03

Across the turn

Session dedup

Metadata provided earlier in a conversation is not re-sent on subsequent calls against the same entity. Enrichment stays active, duplication is suppressed.

Compounds with session length

Each mechanism addresses a different source of waste. Together they produce sessions that accomplish more with fewer tokens.

Session-aware deduplication

The third mechanism is session-aware deduplication. The platform tracks which metadata context has been provided within a conversation. If an agent described a table earlier in the session, subsequent queries against that table do not re-send the same metadata. The enrichment is still active, but duplicated context is suppressed.

Deduplication is particularly effective in exploratory sessions where an agent queries multiple tables in the same schema or follows lineage across related datasets. Overlapping metadata (shared owners, common tags, related glossary terms) is provided once and referenced subsequently.

The combination of all three mechanisms reduces per-session token consumption by 40 to 60 percent compared to a naive multi-tool MCP deployment. For organizations running thousands of agent sessions per day, this represents a meaningful reduction in LLM API costs.

Fewer tools that return richer responses. A single describe-table tool returns schema, context, quality, ownership, lineage, and deprecation in one call.
The architectural choice

Why fewer, richer tools outperform many narrow ones

The MCP ecosystem trend is toward tool proliferation: one tool per API endpoint, resulting in MCP servers with 50 to 200 narrow tools. Each tool does one thing. The agent must orchestrate multiple tools to accomplish any useful task.

Plexara takes the opposite approach: fewer tools that return richer responses. A single describe-table tool returns schema, business context, quality signals, ownership, lineage, and deprecation warnings. The agent receives everything it needs in one call and can proceed to answering the question.

This architectural choice compounds. Fewer tools shrink the definition overhead paid on every request, and richer responses cut both the round trips and the reasoning the agent spends between them. That compounding is where the 40 to 60 percent per-session reduction comes from.

Token efficiency in enterprise MCP deployments

Quantifying token waste in typical MCP architectures

Cross-enrichment consolidation

Tool visibility filtering

Cross-enrichment

Visibility filtering

Session dedup

Session-aware deduplication

Why fewer, richer tools outperform many narrow ones

Related reading

103 - Context, compression, and memory

201 - Anatomy of a Plexara MCP

Five kinds of memory, and how each comes back

Cookie Preferences