Unity Catalog: Setting Up Your First Metastore
Unity Catalog has been in preview since mid-2021. If you've been watching it and wondering when to actually engage with it, the answer is: now is a reasonable time to start experimenting, especially if you have a new workspace or a non-critical environment where you can work without fear of breaking production.
Here's what setting up your first UC metastore actually involves — and what the Databricks documentation doesn't fully prepare you for.
What Unity Catalog Is
Unity Catalog is a centralized governance layer for Databricks. Instead of each workspace having its own Hive metastore (with its own isolated tables, permissions, and no visibility across workspaces), Unity Catalog creates a single metastore that multiple workspaces share. One place to manage access, one place to see lineage, one place to manage data. That's the promise.
The catalog hierarchy: Metastore → Catalog → Schema → Table. The first level — Catalog — is new if you're coming from Hive. In Hive, you had Schema → Table. In UC, you have a three-tier namespace.
-- Unity Catalog three-tier namespace
SELECT * FROM my_catalog.sales_data.customer_orders;
-- ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^
-- Catalog Schema Table
-- vs Hive two-tier namespace
SELECT * FROM sales_data.customer_orders;
Creating the Metastore
Metastore creation happens in the Databricks Account Console, not in the workspace admin UI. You need an Account Admin role — different from a workspace admin. This distinction trips up a lot of teams. If you've only ever worked in individual workspaces, you may not have Account Console access.
Once you have access, the metastore setup requires:
- An ADLS Gen2 (Azure) or S3 (AWS) storage account with a container for the metastore root storage
- A storage credential: a managed identity or service principal with access to that storage
- Creating the metastore in the Account Console, pointed at that storage location
- Assigning the metastore to one or more workspaces
Creating Your First External Location
External locations map a storage path to a credential, allowing Unity Catalog to access files at that path. This is how you tell UC "when code in this workspace accesses abfss://[email protected]/, use this credential."
-- Create a storage credential first (done via UI or REST API)
-- Then create an external location that uses it
CREATE EXTERNAL LOCATION prod_datalake
URL 'abfss://[email protected]/'
WITH (STORAGE CREDENTIAL prod_storage_credential)
COMMENT 'Production data lake — all zones';
-- Verify access
VALIDATE STORAGE LOCATION 'abfss://[email protected]/bronze/';
Creating Catalogs and Schemas
CREATE CATALOG IF NOT EXISTS prod_analytics
COMMENT 'Production analytics catalog — data engineering owned';
USE CATALOG prod_analytics;
CREATE SCHEMA IF NOT EXISTS sales
COMMENT 'Sales domain — order, customer, and product data';
CREATE SCHEMA IF NOT EXISTS finance
COMMENT 'Finance domain — billing, revenue, and AR data';
-- Grant access
GRANT USAGE ON CATALOG prod_analytics TO `[email protected]`;
GRANT USAGE ON SCHEMA prod_analytics.sales TO `[email protected]`;
GRANT SELECT ON SCHEMA prod_analytics.sales TO `[email protected]`;
The Part That Surprises Most Teams
Existing Hive metastore tables don't automatically appear in Unity Catalog. They live in the workspace-local Hive metastore, accessible via the hive_metastore catalog in a UC-enabled workspace, but they're not governed by UC permissions. Migration is a separate step — more on that in a future post. For now, expect to work with a mixed environment during any transition period. As always, I'm here to help.