The Open Source Flex: Tabular Acquisition and Open-Sourcing Unity Catalog
Databricks acquired Tabular at DAIS 2024. If that name doesn't immediately ring a bell: Tabular was founded by Ryan Blue and Daniel Weeks — the original creators of Apache Iceberg, the table format that has been eating into Delta Lake's market position for the last two years. Databricks bought the company that built the most credible competition to their own open table format.
At the same conference, they open-sourced Unity Catalog.
Whether you read this as a genuine philosophical commitment or as competitive maneuvering, it's one of the more interesting moves I've seen in the data infrastructure space in years.
Why Open Formats Matter More Now Than Two Years Ago
Two years ago, the open format conversation was mostly theoretical — vendor lock-in risk, future optionality, not wanting to bet the entire data estate on one company's table format. That was a reasonable architectural concern but not an urgent one. In 2024, it's urgent for a different reason: AI training data needs to be accessible by multiple compute engines, not just one.
Fine-tuning a model might happen in Databricks. The vector embeddings might live in a different system. The feature store might be shared across Databricks, dbt, and a custom inference service. When your data has to move between these systems, you want it in a format all of them can read natively, not a format that requires an export step. Open formats reduce that friction. The Tabular acquisition and Unity Catalog open-sourcing are Databricks saying: we're building an open ecosystem, not a proprietary one.
Is This a Genuine Stance or Competitive Pressure?
Both, honestly. And I don't think that's cynical — those two things aren't mutually exclusive.
Databricks has always leaned open source. Apache Spark came out of their founders' research lab. Delta Lake was open-sourced in 2019. MLflow is Apache-licensed. The company's DNA is genuinely aligned with the open-source community in a way that Snowflake's, for example, is not. That's real.
It's also true that the competitive pressure from Iceberg — and from the cloud vendors who built native Iceberg support — was real. Google BigQuery, AWS Athena, and Snowflake all support Iceberg. The ecosystem was fragmenting in a way that was going to cost Databricks deals. Buying Tabular and open-sourcing UC is partly a response to that pressure: if we can't beat the open format movement, lead it.
The result of both motivations is the same outcome: a more interoperable ecosystem. I'll take it regardless of the motivations behind it.
What This Means for the Lakehouse vs. Warehouse Debate
The honest answer is that the debate has become less interesting than it was in 2021. The lakehouse architecture won in the sense that every major data warehouse vendor now has a lakehouse story, and every major lakehouse vendor now has a SQL analytics story. The boundaries have blurred.
What UniForm and Iceberg interoperability do is blur them further: Delta tables can now be read natively by Iceberg-compatible engines, and vice versa. The practical implication is that you can run Snowflake queries on data that lives in your Delta Lake, without copying it. That was not possible two years ago.
If you've been deferring architectural decisions because you weren't sure which format to commit to, the answer is getting clearer: pick whichever one your primary platform uses natively, and rely on the interoperability layer to handle the rest. The format war is winding down. As always, I'm here to help.