The Data Platform Is the AI Infrastructure: What That Means in Practice
There's a framing I keep coming back to when talking to clients about their AI strategy: the question isn't "should we build an AI platform" — the question is "do we have the data infrastructure to support AI that actually works?" The models are commoditizing. The data is not.
This is the "Frontier Firm" argument that Microsoft is pushing, and it's the right argument even if you don't buy the Copilot stack.
Why the Data Layer Is the Competitive Moat
Every enterprise is running roughly similar foundation models. GPT-5, Claude Sonnet 4.5, Llama 4 — the capability gap between them for most enterprise tasks is smaller than people think. The gap that actually produces different outcomes is the quality, freshness, and accessibility of the data you give the model. An agent that can query a well-governed, current, semantically rich Unity Catalog makes better decisions than the same model querying a poorly labeled, stale, inconsistently structured data lake.
The data engineering work you've been doing for the last decade — schema enforcement, lineage tracking, quality checks, access control — turns out to be the foundation for AI that works reliably. That's not an accident.
What This Means for Your Next Infrastructure Decision
When you're evaluating AI infrastructure investments, the highest-leverage ones are usually data infrastructure investments: better catalog coverage, richer metadata, more reliable quality guarantees, faster data freshness. A better-governed Delta table produces better agent outputs than a higher-parameter model querying a messy one.
The teams I see getting the most out of AI on their data platform are the ones who treated Unity Catalog and data quality seriously before they started building agents. The ones who didn't are discovering that the models expose every gap they had in the underlying data. As always, I'm here to help.