The protocol vs. product distinction
SQL outlasted every proprietary query language. HTTP outlasted every proprietary network protocol. SMTP outlasted every proprietary messaging system. The pattern is consistent across five decades of computing: open protocols survive, proprietary alternatives get acquired, sunset, or abandoned.
This is not a philosophical preference. It is an economic observation. Protocols accumulate investment from many participants. Each implementation strengthens the ecosystem. Each integration raises the switching cost away from the protocol, not toward any single vendor. Products accumulate investment from one vendor. When that vendor changes strategy, gets acquired, or fails, every customer pays the re-platforming cost.
Enterprise technology decisions made today will be evaluated against the same pattern. The AI infrastructure being built now will either rest on durable open standards or on proprietary foundations that will require replacement within a product cycle.
Forty years of pattern
SMTP
1982Displaced
MCI Mail, CompuServe mail, proprietary X.400
Closed messaging systems were absorbed into an open standard no one owned.
HTTP
1989Displaced
Gopher, WAIS, proprietary on-line services
The web won because HTTP was implementable by anyone; nothing else was.
SQL
1986Displaced
QUEL, proprietary query DSLs
Forty years in and SQL is still the lingua franca of data. Each vendor's DSL is not.
MCP
2024Live nowDisplaced
Per-framework tool interfaces
Before MCP, integrating a data source with each AI framework meant a separate project. Now it is one.
MCP as USB-C for AI
The Model Context Protocol standardizes how AI agents interact with external tools and data sources. Before MCP, every agent framework defined its own tool interface. Integrating a data source with Claude, GPT, Gemini, and Llama required four separate implementations. MCP collapses this to one.
The analogy to USB-C is precise. Before USB-C, every device manufacturer chose its own connector. Users accumulated drawers full of incompatible cables. USB-C did not make better cables. It made cables interchangeable. MCP does the same for AI tool integration: any MCP-compatible client can connect to any MCP server. The client and server evolve independently.
MCP adoption has grown from zero to tens of thousands of server implementations in under a year. Claude Desktop, Cursor, and custom agents built with SDKs in every major language all speak MCP. This adoption velocity matches historical patterns for protocols that reach critical mass. Once a protocol crosses the threshold where the cost of not supporting it exceeds the cost of supporting it, adoption accelerates and becomes self-reinforcing.
The USB-C analogy
AI Clients
MCP
one protocol, any pair
MCP Servers
Trino's federation model
Trino federates SQL execution across data sources that were never designed to work together. A single query can join a PostgreSQL table with an Elasticsearch index, an Iceberg lakehouse, and a Cassandra cluster. Trino does not require data to be moved, copied, or transformed. It queries data where it lives.
This federation model has a specific architectural consequence for AI agents: they do not need to know where data is physically stored. An agent writes standard SQL. Trino routes the query to the correct data source, executes it, and returns results in a uniform format. The agent interacts with one query engine regardless of how many underlying data sources exist.
Trino supports over 40 connectors. It is used in production at scale by organizations that process petabytes of data daily. The connector ecosystem continues to expand because Trino is an open protocol with a permissive license, not a product with a connector marketplace. Anyone can build a connector. Everyone benefits.
Federation, not migration
Agent
standard SQL
Trino
federated execution
PostgreSQL
Iceberg
Elasticsearch
Cassandra
Kafka
S3
MySQL
Snowflake
…40+
DataHub's metadata graph
DataHub models metadata as a graph: entities (datasets, dashboards, users, glossary terms), relationships (lineage, ownership, containment), and aspects (schema, descriptions, tags, quality signals). This graph structure captures the full context of an enterprise data estate in a way that flat catalogs cannot.
The graph model matters for AI agents because it supports traversal. An agent can start with a dataset, follow lineage to upstream sources, check quality signals on those sources, find the glossary terms that define the business concepts, and identify the data steward responsible for accuracy. This traversal provides the deep context that prevents incorrect queries.
DataHub is the most widely adopted open metadata platform, with contributions from organizations across industries. Its entity model is extensible: custom entity types, structured properties, and data contracts can capture domain-specific metadata without forking the platform. Building on DataHub means building on a metadata standard, not a vendor catalog.
Protocols accumulate investment from many participants. Products accumulate investment from one vendor. When that vendor changes strategy, every customer pays the re-platforming cost.
Why each was chosen
MCP was chosen because it is the only protocol-level standard for AI tool integration with meaningful adoption. Trino was chosen because it is the only open federated query engine that supports the breadth of data sources enterprises actually use. DataHub was chosen because it is the only open metadata platform with a graph model rich enough to support bidirectional semantic enrichment.
Each choice was evaluated against proprietary alternatives. Proprietary query engines offer deeper integration with their own storage but cannot federate across sources from other vendors. Proprietary catalogs offer polished interfaces but lock metadata into closed ecosystems. Proprietary AI frameworks offer convenience but tie agents to specific model providers.
The common thread is portability. An enterprise that builds on MCP, Trino, and DataHub can change its LLM provider without changing its data infrastructure. It can add new data sources without re-architecting its agent layer. It can replace any individual component without disrupting the others. This modularity is not a theoretical benefit. It is how enterprises survive technology transitions.
The tradeoff
Single-vendor stack
Built on proprietary products
- Deep integration inside one ecosystem, breakdown at the edges
- Connector surface area controlled by one vendor
- AI assistant tied to vendor compute and semantic layer
- Re-platforming cost on every strategic pivot
MCP · Trino · DataHub
Built on open protocols
- Change LLM providers without changing data infrastructure
- Add data sources by adding a Trino connector
- Replace any component without disrupting the others
- Switching cost is paid toward the protocol, not the vendor
What happens when you build on proprietary alternatives
Every major data warehouse vendor has shipped an AI assistant. These assistants work well within their own ecosystem. They access the vendor catalog, they query the vendor compute engine, they use the vendor semantic layer. The integration is seamless because everything is controlled by one company.
The problem emerges when the enterprise has data outside that ecosystem, which every enterprise does. A second warehouse, a legacy database, SaaS applications, object storage in a different cloud. The vendor assistant cannot reach this data. The enterprise now needs a second AI integration for the data outside the primary vendor, and a third for the data outside both. Each integration is proprietary, with its own configuration, its own security model, and its own limitations.
Building on protocols avoids this fragmentation. A single MCP server connected to a federated query engine accesses all data sources through one interface. The security model is unified. The metadata is centralized. The agent experience is consistent. When the enterprise adds a new data source, it adds a Trino connector. The rest of the stack is unchanged.
