What you will take away from this lesson
Article 305 covered how the body of an asset changes over time. This article is about everything attached to the asset that is not the body: the descriptive fields the agent fills when it saves (name, description, tags), and the provenance Plexara records on its own (a snapshot of the tool calls the agent made during the producing session).
Provenance is worth understanding precisely, because what it captures and what it does not are both load-bearing. Plexara sits at the MCP boundary: tool invocations cross that boundary and get recorded; chat content does not. That gives you a useful audit trail of what tools the agent ran in your session, but it is a correlation, not a proof. The agent may have used those tool results, ignored them, or transformed them in ways provenance does not capture. This lesson opens the record, names what it can and cannot tell you, and shows how to use it well.
Learning Objectives
- 01Distinguish the two kinds of metadata an asset carries: descriptive (name, description, tags) and provenance (the tool calls Plexara observed in the session).
- 02Read an asset's provenance: the list of tool invocations Plexara recorded at the MCP boundary, with each call's name, timestamp, and parameters.
- 03Understand the limits: Plexara does not see your chat with the agent, and the agent may have used, ignored, or transformed the captured calls. Provenance is the strongest available correlation, not a causal proof.
- 04Recognize that provenance is captured at save time, attached to the asset, and not refreshed on subsequent content updates.
- 05For Trino Export assets, read the extra fields the export captures automatically: the SQL, the source tables, and the row count.
Where this lesson sits in the curriculum
305 covered editing the content of an asset over time. 306 covers everything else attached to the asset: the metadata the agent fills and you can edit, and the metadata Plexara records on its own.
300 Series: getting more out of Plexara
- 301Creating reports and dashboardsHow to ask the agent for shareable work product instead of chat that scrolls away. The habits that decide whether your dashboard survives next week.
- 302Exporting dataWhen you need a spreadsheet a teammate can sort, or a data file for another system, instead of a view.
- 303Sharing your workTwo ways to share: a user share to a named teammate, or a public link anyone with the URL can open. Three controls on either: expiration, notice text, revocation.
- 304Creating collectionsBundle a dashboard, a summary, and the underlying data into one navigable briefing. Build it during the session via the agent, or by hand on the portal's Collections page.
- 305Editing what you already haveUpdate an asset in place so shared links keep working and history stays on one record. Metadata edits do not bump the version; content edits do; revert is append-only.
- 306How an asset was builtThe audit trail Plexara records at the MCP boundary: which tool calls the agent invoked, with what parameters, in the producing session. Captured at save time, readable by you or the agent.
- 308Reproducible promptsSave the starting instruction as a first-class prompt with named arguments via Manage Prompts (manage_prompt). Re-run it later with different values. Personal, persona, or global scope. What re-running does and does not guarantee.
The 300 series is practical recipes for working with Plexara day to day. It assumes the mental model from 205.
Two kinds of metadata
An asset's full record has two halves. One half is descriptive: the name, description, and tags. The agent fills these when it saves the asset, taking cues from your prompt and the session context; you can edit them afterward through the agent or the portal. The other half is provenance: the record of MCP tool calls Plexara observed during the producing session. The descriptive half is editable by the asset's owner; the provenance half is not editable by anyone. Most of this lesson is about the second half, because the first half is already familiar from 301 and 305.
The two kinds of metadata an asset carries
Descriptive metadata
- What it is
- Name, description, tags. The agent fills these on save based on your prompt and the session context. Indexed for search; shown on the Assets page and the asset viewer. Editable later through the agent or the portal without bumping the version.
- Who sets it
- The agent on save, taking cues from your prompt. You can edit any of it later (through the agent or the portal) and the platform never overwrites your edits on subsequent updates.
- Example
- name: "Q3 2025 regional sales review", tags: ["sales", "q3-2025", "regional"]
Provenance metadata
- What it is
- A snapshot of the MCP tool calls Plexara observed during the session before save_artifact harvested the buffer. Captured automatically; not editable by anyone. Includes each call's name, timestamp, and parameters, plus the session_id and user_id. Does not include chat content, tool responses, or agent reasoning.
- Who sets it
- The platform, on every save_artifact and trino_export call. You never write to it; you only read it. The record is an approximation of how the asset was built, not a step-by-step proof.
- Example
- tool_calls: [datahub_search → trino_query → trino_query]; session_id, user_id.
The descriptive side was covered in 301 (naming for findability) and 305 (editing metadata in place). The rest of this lesson is the provenance side.
The shape of a provenance record
Provenance is a small structure with two layers. At the top, it identifies the session and the user. Inside, it carries an ordered list of MCP tool calls Plexara observed during the session before save_artifact (or trino_export) harvested the buffer. Each call has three fields. Note that what gets recorded is the call (its name and parameters), not the response or the agent's use of it.
What the platform captures in provenance
For each tool call in the chain
tool_name
The MCP tool the agent called. Examples: datahub_search, trino_query, knowledge_search.
timestamp
RFC 3339 UTC timestamp of when the call ran. Lets you sequence events and correlate with audit logs.
parameters
The arguments the agent passed to the tool, as a JSON object. For trino_query, this includes the SQL. For datahub_search, the search query string and any filters.
At the top level
session_id
The Plexara session that produced the asset. Use this to find the conversation log that goes with the audit trail.
user_id
The authenticated user whose session ran the calls. The owner field on the asset is the same person; provenance.user_id is a redundant copy at the moment of save.
The platform caps the per-session buffer at 100 tool calls (oldest evicted). Normal analytical sessions stay well under that. If your session runs more than 100 tool calls before saving, the earliest ones are dropped from the buffer.
What provenance is, and what it is not
Plexara records tool calls at the MCP boundary, which is where the agent talks to data tools. It does not record what crosses any other boundary: not your prompts to the agent, not the agent's responses to you, not the agent's internal reasoning. This is a deliberate privacy boundary. The cost is that provenance is a correlation, not a causal trace.
What provenance is, and what it is not
What Plexara records
- The name of every MCP tool the agent invoked in your session.
- The parameters the agent passed to each tool (the JSON arguments).
- A UTC timestamp for each call.
- The session_id and the authenticated user_id.
What Plexara does not record
- The text of your prompts to the agent.
- The agent's responses back to you.
- The agent's internal reasoning between turns.
- The content of the tool responses (only the parameters going in, not the data coming back).
- Which captured tool calls the agent actually used to build the asset, versus called and ignored.
- Any transformations the agent applied (rounding, filtering, joining results) that were not themselves separate tool calls.
Plexara sits at the MCP boundary. Tool invocations cross that boundary; chat content does not. This is deliberate: by not storing what you said or what the agent said back, the platform avoids becoming the system of record for your private analytical conversation. The cost is that provenance is a correlation between tool activity and the resulting asset, not a step-by-step proof of causation. Treat it as the strongest available record, with the caveats above.
A real provenance record
Abstract field lists are easier to ground when you see one. The block below is the kind of record a Q3 sales dashboard would carry after a typical analysis session.
A real provenance record from a Q3 sales dashboard
{
"session_id": "sess_2026_05_15_abc",
"user_id": "550e8400-e29b-41d4-a716-446655440000",
"tool_calls": [
{
"tool_name": "datahub_search",
"timestamp": "2026-05-15T14:22:03Z",
"parameters": { "query": "regional sales 2025", "type": "dataset" }
},
{
"tool_name": "trino_query",
"timestamp": "2026-05-15T14:23:11Z",
"parameters": {
"sql": "SELECT region, transaction_date, net_amount FROM sales.transactions WHERE quarter='Q3' AND year=2025",
"limit": 10
}
},
{
"tool_name": "trino_query",
"timestamp": "2026-05-15T14:24:55Z",
"parameters": {
"sql": "SELECT region, SUM(net_amount) FROM sales.transactions WHERE quarter='Q3' AND year IN (2024, 2025) GROUP BY region, year",
"limit": 100
}
}
]
}Read top to bottom, this is the agent's observed tool activity in the session: a catalog search, a small sanity-check query, a year-over-year aggregate. The most likely path from those calls to the dashboard's content runs through the year-over-year query, but Plexara cannot prove that from the record alone: the agent may have applied rounding, filtering, or other transformations between the query result and the rendered chart, and none of those transformations would appear here. Two field notes: user_id is the authenticated user's UUID (the owner email lives on the asset record itself, not in provenance), and save_artifact does not appear in tool_calls because the save is what harvests the buffer, not a recorded call.
How to actually look at provenance
You do not have to manually parse the JSON. Two surfaces present the same record in a readable form.
When provenance is captured
The single most important fact about provenance is that it is captured once, at save time, and is not refreshed after that.
The 100-call cap
The provenance middleware caps the per-session buffer so that a runaway session does not produce an asset whose audit record is megabytes of JSON. The cap is generous enough that almost no real analytical session hits it.
Trino Export adds extra fields
For data exports specifically, the trino_export entry in provenance carries four fields that other tool calls do not. These exist because an export is meant to be reproducible, and reproducibility needs the actual SQL.
Trino Export adds four fields to provenance that other tools do not
export_query
The SQL that produced the export, captured verbatim. You can read it later, copy it into a notebook, or hand it to a colleague to re-run.
source_tables
The tables Plexara extracted from the SQL. Used for sensitivity-tag inheritance (302) and useful as a quick "what data sources fed this" reference.
format
csv, json, markdown, or text. Records which shape the export was produced in.
row_count
How many rows were in the export at save time. Lets a reviewer compare today's row count against the historical one.
These fields land in the last entry of the provenance tool_calls array, under the trino_export tool_name. Combined with the session's earlier tool calls, you get the full path from a question in chat to a downloadable CSV with the exact query that produced it.
The three questions provenance answers
Provenance is useful because it answers questions a stakeholder will eventually ask. The honest version of each answer is hedged: some parts of the record are exact (who and when), and some are inferences (which observed query produced which number). Three questions come up often enough to memorize both the answer and the hedge.
Questions provenance helps you answer (with the right hedge on each)
"Where did this number come from?"
Provenance shows which queries the agent ran and what they asked for. That is usually enough to point at the source: a trino_query against a specific table with a specific filter. The honest framing is that provenance gives you a strong correlation, not a step-by-step proof, since the agent may have transformed the query result on its way into the chart.
"Who had access to produce this?"
The user_id and session_id at the top of the provenance, plus the asset's owner_email field. These are exact: they identify the authenticated user whose session produced the asset. Compliance can correlate them with the platform audit log to confirm access controls applied at the time.
"Can I reproduce this analysis next quarter?"
The captured SQL is your starting point. Copy the export_query (for exports) or the trino_query parameters into a new session, parameterize the date, and the agent will produce a comparable asset. Whether it matches exactly depends on whether the original session applied transformations the provenance does not record. Article 308 covers reproducible prompts in depth.
What this looks like in practice
A real session where a board member asked where a number came from, and the agent answered from the asset record (not by re-running the analysis).
Answering a where-did-this-number-come-from question
You
A board member asked where the Southwest Q3 number on the regional sales dashboard came from. Open the asset and show me its provenance.
Agent
The Q3 2025 regional sales review (asset_id asset_01HK7R8Z8M0Y6A5G1R6FQ2VQNK) was built by [email protected] on 2026-05-15 in session sess_2026_05_15_abc. Three tool calls were recorded; the save itself is implicit (it is the act that harvested the provenance buffer, not a recorded call).
14:22:03Z datahub_search query="regional sales 2025"
14:23:11Z trino_query (sanity check, LIMIT 10)
14:24:55Z trino_query SELECT region, SUM(net_amount) FROM sales.transactions WHERE quarter='Q3' AND year IN (2024, 2025) GROUP BY region, year
Based on the recorded tool calls, the most likely source of the Southwest Q3 2025 figure is the year-over-year trino_query (the second one), which aggregated sales.transactions across Q3 of 2024 and 2025, grouped by region and year. The Southwest row of that result, for year=2025, is consistent with what the dashboard shows. I cannot confirm causation from provenance alone: the record captures which queries ran, not how their results flowed into the rendered chart. If the auditor needs a stricter check, the SQL is here for re-execution. Note: this is the v1 provenance. The asset has been updated to v3 since; for what changed at v2 and v3, use the version history.
The agent did not regenerate anything. It read the existing provenance record on the asset and translated it into prose, including the hedge about what the record cannot prove. The honest answer is more useful than a confident wrong one: the SQL is available for re-execution, and the user_id and session_id are exact, even if the step-by-step content path is an inference.
Working with provenance day to day
Three habits make the provenance feature useful instead of theoretical. None of them require knowing the storage format; they are prompt shapes.
Working with provenance day to day
Phrase a stakeholder question as a provenance question. "Where did this number come from?" becomes "open the Q3 dashboard and show me its provenance." The agent reads the record and translates it into prose.
Quote the SQL when you reproduce. If you want a new asset built the same way last quarter's was, ask the agent to read the export_query from the prior asset's provenance and re-run it with adjusted dates. Article 308 makes this its own subject.
Treat session_id as the link back to the chat log. When the audit story needs more than what is in provenance, the session_id is the breadcrumb that gets you to the original conversation.
What 308 covers
Provenance answers how a single asset was built. The natural follow-up is how to run that same work again next quarter without re-typing the instruction. 308 is the prompt object: how to save the starting instruction as a reusable template, parameterize the parts that change, and the limits of what re-running guarantees.
Key terms
Six terms cover the vocabulary of provenance. The first four apply to every asset; the last two are Trino Export specifics.
Key Terms
- Provenance
- The audit record attached to an asset: the ordered list of MCP tool calls Plexara observed during the producing session, plus session_id and user_id. Captured automatically on save_artifact and trino_export. A correlation between observed activity and the saved asset, not a causal proof.
- tool_calls
- The array of recorded tool invocations in the asset's provenance. Each entry has tool_name, timestamp (RFC 3339 UTC), and parameters. Capped at 100 entries per session.
- session_id
- The Plexara session that produced the asset. Lets you correlate the asset with the conversation log it came from.
- user_id
- The authenticated user whose session produced the asset. Mirrors the asset's owner field at save time.
- export_querytrino_export only
- The SQL captured verbatim in the trino_export entry of a data export's provenance. Read it later to reproduce the export, or hand it to a colleague to verify.
- source_tablestrino_export only
- The list of warehouse tables Plexara extracted from the export SQL. Used for sensitivity-tag inheritance and as a quick reference for which sources fed an asset.
