The 2024 Lakehouse Evolution: What Actually Changed

It's December and I've had six months to see which DAIS 2024 announcements actually mattered and which were conference-stage noise. The scorecard is clearer now. Here's my honest read on what actually changed for the lakehouse in 2024.

What Genuinely Moved the Industry Forward

Unity Catalog open-sourcing. This one landed. The project has genuine contributor activity from non-Databricks companies. The argument that you should govern your data and AI assets through a proprietary catalog became harder to make — the catalog is now open. More importantly, the ecosystem tooling is catching up: other data platforms are starting to build UC-compatible integrations because the spec is now open for them to implement against. This is a multi-year play and the 2024 chapter is the foundation.

Vector Search GA + RAG pattern standardization. RAG went from "thing researchers are excited about" to "pattern data engineers implement in production." Databricks Vector Search's UC integration — governing vector index access through the same permission model as tables — resolved the biggest governance concern with enterprise RAG. Teams that were hesitating on RAG because of access control uncertainty now have a clear path.

UniForm interoperability. Quieter than the other announcements, more consequential for enterprises with multi-engine data estates. The ability to read Delta tables with Iceberg-compatible engines eliminated a category of data copy jobs that existed solely to convert between formats. The format war is effectively over. This makes Delta a more defensible default for new projects.

What Still Feels Unsolved

The agent framework story is still immature. Mosaic AI Agent Framework showed the right architecture in preview, but by the end of 2024 the tooling for testing, monitoring, and debugging compound AI systems hasn't caught up to what production requires. Teams building production agents are still writing significant custom scaffolding that shouldn't be necessary.

Cross-cloud and cross-account lineage remains a real gap. Enterprises with data in multiple Databricks accounts, or with data moving between Databricks and external systems, still can't get a complete lineage picture from within UC. The lineage story is excellent within a single account; it stops at the boundary.

The cost governance story also needs more tooling. You can monitor Databricks spend with external cost management tools, but native cost attribution — "this model served these inferences and cost this much" — is not yet the plug-and-play experience it needs to be for teams managing GenAI budgets.

My Predictions for 2025

Agent framework maturity accelerates in H1 2025. The preview state of the agent tooling at DAIS was clearly a "we need to ship something" decision, not a "this is done" decision. Expect significant investment here. The Tabular acquisition pays off through deeper Iceberg ecosystem integration. And Unity Catalog gets extended to explicitly govern agent definitions and tool registries — the governance story for compound AI systems starts to catch up to the governance story for data assets. As always, I'm here to help.

Read more