The Big Picture: What Data + AI Summit 2024 Really Signaled

I've been to enough data conferences to have calibrated expectations. The announcements are usually incremental, the keynotes are always polished to the point of being a little hollow, and you get three genuinely useful things out of talking to other practitioners in the hallway. Data + AI Summit 2024 was different. Not because any single announcement was paradigm-shattering, but because of what the whole picture added up to.

Something shifted this year. Here's what I think it was.

16,000 People Don't Lie About What's Happening

Over 16,000 attendees in person. 40,000+ virtual. Those numbers matter not because Databricks wants them in their press release, but because of who showed up: not just data engineers and platform teams, but AI teams, product teams, executives who've never been to a data conference before. The demand signal has changed. The people who didn't care about data infrastructure two years ago are now very, very interested in what the infrastructure can do for them.

That attendance profile is actually a leading indicator. The people who show up to a conference tell you what their companies are paying attention to — and right now, every company is trying to figure out how to build AI systems on their own data. That's Databricks' exact value proposition, and the timing is right for them in a way it's never quite been before.

The Three Themes Databricks Kept Returning To

Strip away the product announcements and you see three recurring arguments:

GenAI demand is real and it's landing on data teams. Not as a research project or a skunkworks initiative, but as production requirements: "we need to build a chatbot over our knowledge base," "we need to fine-tune a model on our customer data," "we need retrieval-augmented generation against our internal documents." Data teams who built pipelines for analytical workloads are now being asked to support AI workloads, and the tooling hasn't fully caught up.

Governance pressure is accelerating. GDPR is old news. The EU AI Act is not. Every enterprise with a serious AI initiative is running into the question of "how do we track what data trained this model, what version of the model is in production, and who approved it." Unity Catalog — and more specifically, the extension of Unity Catalog to govern AI assets — was positioned as the answer. I'll dig into that more later this month.

Data estate complexity hasn't gotten simpler. Most large enterprises have data across three clouds, in five different formats, governed by four different teams with different tools. Databricks' interoperability story — UniForm, the Tabular acquisition, open-sourcing Unity Catalog — is an explicit acknowledgment that "just use Databricks for everything" is not a realistic answer for most of their customers. That's a more honest position than I expected.

Are We Actually Ready for This?

Honest answer: most organizations are not. And I say that having spent the last several years building exactly the kind of metadata-driven, governance-aware data platforms that are supposed to be the foundation for this. The problem isn't tooling — Databricks and its competitors have the tools. The problem is organizational: you cannot build a reliable AI system on top of a data estate you don't fully understand.

The teams that are going to win over the next two years are the ones that spent the last two years getting their data house in order — solid lineage, working governance, reliable pipelines, curated feature stores. Everyone else is going to spend 2024 and 2025 finding out that "AI-ready" isn't a switch you flip on top of a messy data platform. It's the output of years of boring data engineering work.

DAIS 2024 signaled that the demand is here. Whether the supply is ready is a different question. As always, I'm here to help.

Read more