Skip to main content
Field notes / philosophy

Protocols outlast products

MCP, Trino, and DataHub are open protocols with communities larger than any vendor. Building on protocols, not proprietary platforms, is the durable choice.

10-minute readPhilosophy

The protocol vs. product distinction

SQL outlasted every proprietary query language. HTTP outlasted every proprietary network protocol. SMTP outlasted every proprietary messaging system. The pattern is consistent across five decades of computing: open protocols survive, proprietary alternatives get acquired, sunset, or abandoned.

This is not a philosophical preference. It is an economic observation. Protocols accumulate investment from many participants. Each implementation strengthens the ecosystem. Each integration raises the switching cost away from the protocol, not toward any single vendor. Products accumulate investment from one vendor. When that vendor changes strategy, gets acquired, or fails, every customer pays the re-platforming cost.

Enterprise technology decisions made today will be evaluated against the same pattern. The AI infrastructure being built now will either rest on durable open standards or on proprietary foundations that will require replacement within a product cycle.

Forty years of pattern

  1. SMTP

    1982

    Displaced

    MCI Mail, CompuServe mail, proprietary X.400

    Closed messaging systems were absorbed into an open standard no one owned.

  2. HTTP

    1989

    Displaced

    Gopher, WAIS, proprietary on-line services

    The web won because HTTP was implementable by anyone; nothing else was.

  3. SQL

    1986

    Displaced

    QUEL, proprietary query DSLs

    Forty years in and SQL is still the lingua franca of data. Each vendor's DSL is not.

  4. MCP

    2024Live now

    Displaced

    Per-framework tool interfaces

    Before MCP, integrating a data source with each AI framework meant a separate project. Now it is one.

Proprietary alternatives get acquired, sunset, or abandoned. Protocols accumulate investment from many participants until the cost of not supporting them exceeds the cost of supporting them.

MCP as USB-C for AI

The Model Context Protocol standardizes how AI agents interact with external tools and data sources. Before MCP, every agent framework defined its own tool interface. Integrating a data source with Claude, GPT, Gemini, and Llama required four separate implementations. MCP collapses this to one.

The analogy to USB-C is precise. Before USB-C, every device manufacturer chose its own connector. Users accumulated drawers full of incompatible cables. USB-C did not make better cables. It made cables interchangeable. MCP does the same for AI tool integration: any MCP-compatible client can connect to any MCP server. The client and server evolve independently.

MCP adoption has grown from zero to tens of thousands of server implementations in under a year. Claude Desktop, Cursor, and custom agents built with SDKs in every major language all speak MCP. This adoption velocity matches historical patterns for protocols that reach critical mass. Once a protocol crosses the threshold where the cost of not supporting it exceeds the cost of supporting it, adoption accelerates and becomes self-reinforcing.

The USB-C analogy

AI Clients

Claude Desktop
Cursor
Custom agent (Python)
Custom agent (TS)

MCP

one protocol, any pair

MCP Servers

Plexara
GitHub
Filesystem
Slack
…thousands more
MCP does not make better tool calls; it makes them interchangeable. Client and server evolve independently.

Trino's federation model

Trino federates SQL execution across data sources that were never designed to work together. A single query can join a PostgreSQL table with an Elasticsearch index, an Iceberg lakehouse, and a Cassandra cluster. Trino does not require data to be moved, copied, or transformed. It queries data where it lives.

This federation model has a specific architectural consequence for AI agents: they do not need to know where data is physically stored. An agent writes standard SQL. Trino routes the query to the correct data source, executes it, and returns results in a uniform format. The agent interacts with one query engine regardless of how many underlying data sources exist.

Trino supports over 40 connectors. It is used in production at scale by organizations that process petabytes of data daily. The connector ecosystem continues to expand because Trino is an open protocol with a permissive license, not a product with a connector marketplace. Anyone can build a connector. Everyone benefits.

Federation, not migration

Agent

standard SQL

Trino

federated execution

PostgreSQL

Iceberg

Elasticsearch

Cassandra

Kafka

S3

MySQL

Snowflake

…40+

A single SQL query can join a PostgreSQL table, an Iceberg lakehouse, an Elasticsearch index, and a Cassandra cluster. Data is queried where it lives.

DataHub's metadata graph

DataHub models metadata as a graph: entities (datasets, dashboards, users, glossary terms), relationships (lineage, ownership, containment), and aspects (schema, descriptions, tags, quality signals). This graph structure captures the full context of an enterprise data estate in a way that flat catalogs cannot.

The graph model matters for AI agents because it supports traversal. An agent can start with a dataset, follow lineage to upstream sources, check quality signals on those sources, find the glossary terms that define the business concepts, and identify the data steward responsible for accuracy. This traversal provides the deep context that prevents incorrect queries.

DataHub is the most widely adopted open metadata platform, with contributions from organizations across industries. Its entity model is extensible: custom entity types, structured properties, and data contracts can capture domain-specific metadata without forking the platform. Building on DataHub means building on a metadata standard, not a vendor catalog.

Protocols accumulate investment from many participants. Products accumulate investment from one vendor. When that vendor changes strategy, every customer pays the re-platforming cost.

Economics, not philosophy

Why each was chosen

MCP was chosen because it is the only protocol-level standard for AI tool integration with meaningful adoption. Trino was chosen because it is the only open federated query engine that supports the breadth of data sources enterprises actually use. DataHub was chosen because it is the only open metadata platform with a graph model rich enough to support bidirectional semantic enrichment.

Each choice was evaluated against proprietary alternatives. Proprietary query engines offer deeper integration with their own storage but cannot federate across sources from other vendors. Proprietary catalogs offer polished interfaces but lock metadata into closed ecosystems. Proprietary AI frameworks offer convenience but tie agents to specific model providers.

The common thread is portability. An enterprise that builds on MCP, Trino, and DataHub can change its LLM provider without changing its data infrastructure. It can add new data sources without re-architecting its agent layer. It can replace any individual component without disrupting the others. This modularity is not a theoretical benefit. It is how enterprises survive technology transitions.

The tradeoff

Single-vendor stack

Built on proprietary products

  • Deep integration inside one ecosystem, breakdown at the edges
  • Connector surface area controlled by one vendor
  • AI assistant tied to vendor compute and semantic layer
  • Re-platforming cost on every strategic pivot

MCP · Trino · DataHub

Built on open protocols

  • Change LLM providers without changing data infrastructure
  • Add data sources by adding a Trino connector
  • Replace any component without disrupting the others
  • Switching cost is paid toward the protocol, not the vendor

What happens when you build on proprietary alternatives

Every major data warehouse vendor has shipped an AI assistant. These assistants work well within their own ecosystem. They access the vendor catalog, they query the vendor compute engine, they use the vendor semantic layer. The integration is seamless because everything is controlled by one company.

The problem emerges when the enterprise has data outside that ecosystem, which every enterprise does. A second warehouse, a legacy database, SaaS applications, object storage in a different cloud. The vendor assistant cannot reach this data. The enterprise now needs a second AI integration for the data outside the primary vendor, and a third for the data outside both. Each integration is proprietary, with its own configuration, its own security model, and its own limitations.

Building on protocols avoids this fragmentation. A single MCP server connected to a federated query engine accesses all data sources through one interface. The security model is unified. The metadata is centralized. The agent experience is consistent. When the enterprise adds a new data source, it adds a Trino connector. The rest of the stack is unchanged.