The Future of Data + AI (2025–2030)
DAIS 2025 is in three weeks. I've spent the last year building on the themes that emerged from the 2024 Summit — compound AI, governance expansion, open formats, the cost reality of enterprise GenAI. Looking further out, at what the data + AI landscape looks like from 2025 to 2030, I want to be direct about what excites me, what worries me, and what I think the community needs to hear.
What Excites Me
The open format convergence is creating something that didn't exist five years ago: a genuine data commons layer. Data in Delta format, readable by Iceberg-compatible engines, governed by an open-source catalog spec, shareable via an open protocol. For the first time, data engineering decisions don't carry the same vendor lock-in consequences they used to. You can store data in Databricks today and move it to a different engine in three years without a data migration project. That's a genuine improvement in architectural optionality that will compound for years.
The governance tooling for AI is about to catch up to the regulation. The EU AI Act pushed hard, NIST published their AI Risk Management Framework, sector-specific regulations are coming. What that pressure produces — usually, with a lag — is tooling that makes compliance tractable. The same way SOC 2 pressure produced a generation of security tooling, AI regulatory pressure is going to produce a generation of AI governance tooling. The teams that build governance-first now will be ahead when that tooling arrives.
Domain-specific AI is going to produce the most durable competitive moats. The generic AI capabilities are commoditizing fast — GPT-5, whatever comes after it, fine-tuned open-source models — they're going to be broadly accessible and roughly equivalent for most tasks. What's not commoditizing is your domain knowledge encoded in a well-curated feature store, your proprietary training data, your validated evaluation methodology, and your organizational ability to iterate on AI systems faster than your competitors. That combination — good data + good process + AI tooling — is harder to replicate than any model architecture.
What Worries Me
The governance debt is accumulating faster than organizations are paying it down. Every enterprise AI system that ships to production without proper documentation, monitoring, and auditability is a future remediation project. When the regulatory deadlines arrive — and the EU AI Act high-risk provisions have dates — the teams with governance debt are going to face remediation costs that make the original build cost look small. I'm already seeing this in financial services, where teams are scrambling to retrofit documentation and monitoring onto systems that have been running in production for 18 months.
The concentration of AI capability in a small number of organizations worries me from an ecosystem health perspective. The organizations with the most compute, the most data, and the most engineering talent are pulling ahead in ways that create structural barriers to entry. This is a policy problem as much as a technology problem, and I don't have a confident prediction for how it resolves.
My Final Message to the Community
The technology is ahead of the governance, the governance is ahead of the culture, and the culture is what ultimately determines whether any of this produces durable value. Build the governance first. Invest in the culture that treats data quality as a non-negotiable. Train the people, not just the models. The organizations that figure out the people and process side of AI — not just the platform side — are the ones that will still be shipping production AI systems in 2030 when the current wave of hype has cycled out.
We're at the beginning of something significant. The tools are real. The demand is real. The opportunity is real. Don't waste it by skipping the boring parts. As always, I'm here to help.