Tokens and your budget

What you will take away from this lesson

Most Plexara users are on a subscription plan. Claude Pro, Claude Max, Claude Team, Claude Enterprise, or an equivalent tier from another vendor. That means the question is almost never "how do I lower my monthly bill." The question is "how do I get the most work done inside the usage limits my plan already pays for." More questions answered, more reports written, more dashboards built, more knowledge captured, all before the session and weekly limits kick in.

Tokens are how those limits are measured. A working mental model of tokens is the difference between blowing through a week's usage on Wednesday and finishing Friday with room to spare.

Learning Objectives

01Explain the difference between input tokens and output tokens and why output weighs heavier against a subscription plan.
02Estimate what a typical Plexara session consumes, broken down by the steps the agent actually takes.
03Identify the three ceilings a subscription enforces: context window per request, session limit per rolling multi-hour window, and weekly plan limit.
04Recognize the prompting patterns that burn through a plan and the ones that stretch it.
05Understand why the discover-query-enrich workflow gets more questions, reports, and dashboards out of the same plan.

Two meters: input tokens and output tokens

Every interaction with a frontier model has two meters running in parallel. Input tokens are everything the model reads: the system prompt, the operating manual, prior turns in the conversation, any documents or catalog results that got pulled in, and the current user message. Output tokens are what the model writes back.

The two meters weigh differently against your plan. Input tokens count for comparatively little. Output tokens count for significantly more, usually between three and five times as much per token. The shape of this asymmetry is stable across providers and plans. If you only remember one thing from this section, remember that the length of the response is what decides how much of your plan each question consumes.

Input tokens and output tokens weigh differently against your plan

Input tokens1× baseline

Your prompt, the system instructions, prior turns, retrieved documents, and tool results.

Output tokenstypically 3 to 5×

Whatever the model generates. Weightings vary by provider and tier, but output almost always consumes more of a session and weekly plan limit than input.

Practical consequence: a concise answer consumes less of your plan than an exhaustive one. Asking the model to dump every row of a query result into the chat is one of the fastest ways to spend down a session budget.

What one Plexara question actually consumes

Abstract numbers do not help you make good decisions. Concrete numbers do. The breakdown below is the real anatomy of a first question inside a Plexara session, observed across typical operational queries against the ACME Corp demo catalog. Numbers vary by question and by connector, but the shape is stable.

Two observations worth internalizing. The platform_info load is a one-time cost per session. Follow-up questions in the same session do not repeat it, which is why long, focused sessions consume less per question than many short ones. The second observation: the trino_query result is the line most likely to blow up, because row count and column width are under your control but not the model's.

What one Plexara question actually spends

platform_info load
~6K
Once per session. The operating manual the agent needs to use the MCP correctly.
User prompt
20–200
Your actual question.
datahub_search result
500–2K
Catalog hits with descriptions, owners, tags, and lineage hints.
datahub_get_queries
500–1K
Curated query templates for the matched dataset.
trino_query result
highly variable
Depends on row count and width. Use trino_export for anything over a handful of rows.
Agent reasoning and final response
1K–5K
The model thinking plus the answer it writes to you.

First question in a session runs about 10K to 15K tokens end to end. Follow-up questions in the same session are cheaper, since platform_info and the catalog context are already loaded.

Repeat queries get cheaper: Plexara tracks what it has already enriched

The "follow-up questions are cheaper" line on the anatomy above is true for a reason specific to Plexara: the platform tracks which tables it has already enriched for you in the current session, and shrinks the enrichment on subsequent queries that touch the same tables.

When a Trino query returns rows from a table, Plexara attaches semantic metadata to the result (description, tags, owners, column descriptions and tags, glossary terms, PII flags) so the agent can reason safely about what it is looking at. On the first query against a given table, the full enrichment is sent. On later queries in the same session, when every referenced table has already been enriched recently, Plexara switches to a compact summary or a reference, keeping only the critical fields the agent still needs (warnings, data-quality signals, sensitivity flags).

The effect compounds over a working session. A typical analysis revisits the same three or four tables from a dozen different angles. The first question pays the full enrichment cost; every question after that pays a fraction of it. This is a Plexara behavior, not a client or provider behavior, and it is scoped tightly: only Trino result enrichment is deduplicated. The platform_info load, catalog search results, and memory recall are not affected.

How Plexara shrinks enrichment on repeat queries

Mode 1
Full enrichment
When: The first time a table appears in a Trino result this session
What it sends: Table description, tags, owners, deprecation status, per-column descriptions and tags, glossary terms, and PII flags.
Mode 2
Summary
When: The same table appears in a later query
What it sends: Only the fields the agent still needs to reason safely: warnings, data-quality signals, sensitivity flags. Descriptions and tags are not repeated.
Mode 3
Reference
When: Aggressive mode for very repeat-heavy sessions
What it sends: A one-line pointer noting that full metadata for this table was already sent earlier in the session. The agent can recall it from earlier context.

Scope: Plexara's enrichment dedup applies to the semantic metadata attached to Trino query results. The platform_info payload, catalog search results, and memory entries are not deduplicated. Tracking is per MCP session and resets when the session ends.

The three ceilings your subscription enforces

Subscription plans (Claude Pro, Max, Team, and the equivalents from other vendors) enforce usage through three distinct ceilings. Treating them as one number is the most common source of confusion.

The first is the context window, which is a per-request ceiling on how much the model can read at once. The second is the session limit, a rolling multi-hour window that Claude and similar clients use to pace usage across the day. The third is the weekly plan limit, the top-line cap that defines what a tier is actually selling. The three are independent. Staying inside all of them is what stretches a plan through a full work week.

Per request

Context window

Unit: Tokens
Typical: 200K to 1M on current frontier tiers
What happens when you hit it: Old turns fall out of the window or get summarized by the client.

Rolling multi-hour

Session limit

Unit: Tokens consumed inside a session window
Typical: Claude Pro and Max use a rolling window that resets several hours after your first message
What happens when you hit it: The client pauses new requests inside that client. The window rolls off and resets on its own.

Per week

Weekly plan limit

Unit: Tokens consumed across the week
Typical: Set by your subscription tier (Pro, Max, Team, Enterprise)
What happens when you hit it: You hit the weekly cap and lose access to the top-tier model for the rest of the week, or pay for additional usage.

Three distinct ceilings. A prompt that fits inside your session and weekly plan budget can still exceed the model's context window. The three are independent and all three have to be respected. API-direct users add a fourth ceiling: per-minute rate limits (TPM), which most subscription users never see.

Specificity is how you get more done per plan

Because output tokens weigh heavier than input tokens, the single most effective habit for getting more out of a plan is to ask for what you actually want. A targeted question produces a targeted answer. A vague question produces an exhaustive one, and the model has every incentive to be thorough.

Patterns that burn through your plan, patterns that stretch it

Most teams pick up the habits below within their first week on Plexara. They are worth learning on purpose rather than learning the hard way, which usually means running into a session limit on a Thursday afternoon with two dashboards left to build. None of them require discipline or prompt-engineering expertise; they are just the natural shape of a discover-query-enrich session.

Burns through your plan

Pasting an entire database schema into the prompt when the agent can discover the one table it needs.
Asking the agent to output raw rows when what you actually wanted was a summary or a chart.
Running every question inside one endless session instead of starting fresh for unrelated topics.
Using free-form SQL for aggregations the operating manual has already encoded as curated queries.
Repeating setup instructions every turn instead of relying on memory and platform_info.

Gets more done per plan

Let datahub_search and datahub_get_queries pick the right, pre-optimized query template.
Use trino_export for large result sets so rows never enter the conversation.
Name the dataset or the entity in your question. Specific prompts produce focused responses.
Ask for the insight or the summary, not the raw output.
Start new sessions for unrelated topics so platform_info and memory stay lean.

Why the default workflow stretches your plan

It is tempting to look at the token meter and reach for aggressive self-rationing: truncating context, summarizing every turn, keeping sessions artificially short. In practice, this is rarely necessary with Plexara because the recommended workflow is already the lean one.

The catalog returns small, precise context rather than whole schemas. The curated queries are pre-optimized and return the columns the agent actually needs. The export path keeps large result sets out of the conversation entirely. The agent stops when it has enough. The path that is most likely to produce a correct answer is also the path most likely to leave you with session and weekly budget to spare.

Where this leads

Tokens are the unit of consumption. The next lesson is about the unit of attention: what the context window actually is, what happens when a conversation fills it up, and why this is the direct reason Plexara has a persistent memory subsystem.

Key terms

Six terms cover the vocabulary you will see in plan pages, limit-reached messages, and provider documentation. Learning them now means those documents read as plain English.

Key Terms

Input token: A token the model reads. Everything you send it (system prompt, prior turns, retrieved documents, tool results, and the current user message) counts as input.
Output token: A token the model generates. The answer itself, plus any intermediate reasoning it writes. Weighs heavier against your session and weekly plan limits than input tokens do.
Session limit: A rolling multi-hour usage window used by Claude Pro, Max, and similar subscription tiers. Once the window is full, new requests pause inside that client until the window rolls off.
Weekly plan limit: The top-line cap on usage across a calendar week. Set by your subscription tier. Maximizing work inside this cap is what subscription users are actually budgeting against.
Context window: The per-request ceiling on how many tokens the model can process at once. Everything the model considers must fit inside this budget.
Semantic enrichment: Metadata Plexara attaches to tables in Trino query results: descriptions, tags, owners, column-level detail, glossary terms, and PII flags. Deduplicated within a session so repeat queries against the same tables do not resend the full metadata.
trino_export: The Plexara tool that writes a large query result to a persisted asset instead of returning the rows in the conversation. The primary mechanism for keeping a large analysis inside a reasonable token budget.

102 - Tokens and your budget

What you will take away from this lesson

Two meters: input tokens and output tokens

What one Plexara question actually consumes

Repeat queries get cheaper: Plexara tracks what it has already enriched

Full enrichment

Summary

Reference

The three ceilings your subscription enforces

Context window

Session limit

Weekly plan limit

Specificity is how you get more done per plan

Patterns that burn through your plan, patterns that stretch it

Why the default workflow stretches your plan

Where this leads

Key terms

Related reading

105 - What is an AI agent?

202 - Your first day with Plexara

203 - DataHub Search: describing the domain

Cookie Preferences