Unity Catalog: Setting Up Your First Metastore

Unity Catalog has been in preview since mid-2021. If you've been watching it and wondering when to actually engage with it, the answer is: now is a reasonable time to start experimenting, especially if you have a new workspace or a non-critical environment where you can work without fear of breaking production.

Here's what setting up your first UC metastore actually involves — and what the Databricks documentation doesn't fully prepare you for.

What Unity Catalog Is

Unity Catalog is a centralized governance layer for Databricks. Instead of each workspace having its own Hive metastore (with its own isolated tables, permissions, and no visibility across workspaces), Unity Catalog creates a single metastore that multiple workspaces share. One place to manage access, one place to see lineage, one place to manage data. That's the promise.

The catalog hierarchy: Metastore → Catalog → Schema → Table. The first level — Catalog — is new if you're coming from Hive. In Hive, you had Schema → Table. In UC, you have a three-tier namespace.

-- Unity Catalog three-tier namespace
SELECT * FROM my_catalog.sales_data.customer_orders;
-- ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^^^^^^^
-- Catalog Schema Table

-- vs Hive two-tier namespace
SELECT * FROM sales_data.customer_orders;

Creating the Metastore

Metastore creation happens in the Databricks Account Console, not in the workspace admin UI. You need an Account Admin role — different from a workspace admin. This distinction trips up a lot of teams. If you've only ever worked in individual workspaces, you may not have Account Console access.

Once you have access, the metastore setup requires:

  1. An ADLS Gen2 (Azure) or S3 (AWS) storage account with a container for the metastore root storage
  2. A storage credential: a managed identity or service principal with access to that storage
  3. Creating the metastore in the Account Console, pointed at that storage location
  4. Assigning the metastore to one or more workspaces

Creating Your First External Location

External locations map a storage path to a credential, allowing Unity Catalog to access files at that path. This is how you tell UC "when code in this workspace accesses abfss://[email protected]/, use this credential."

-- Create a storage credential first (done via UI or REST API)
-- Then create an external location that uses it
CREATE EXTERNAL LOCATION prod_datalake
URL 'abfss://[email protected]/'
WITH (STORAGE CREDENTIAL prod_storage_credential)
COMMENT 'Production data lake — all zones';

-- Verify access
VALIDATE STORAGE LOCATION 'abfss://[email protected]/bronze/';

Creating Catalogs and Schemas

CREATE CATALOG IF NOT EXISTS prod_analytics
COMMENT 'Production analytics catalog — data engineering owned';

USE CATALOG prod_analytics;

CREATE SCHEMA IF NOT EXISTS sales
COMMENT 'Sales domain — order, customer, and product data';

CREATE SCHEMA IF NOT EXISTS finance
COMMENT 'Finance domain — billing, revenue, and AR data';

-- Grant access
GRANT USAGE ON CATALOG prod_analytics TO `[email protected]`;
GRANT USAGE ON SCHEMA prod_analytics.sales TO `[email protected]`;
GRANT SELECT ON SCHEMA prod_analytics.sales TO `[email protected]`;

The Part That Surprises Most Teams

Existing Hive metastore tables don't automatically appear in Unity Catalog. They live in the workspace-local Hive metastore, accessible via the hive_metastore catalog in a UC-enabled workspace, but they're not governed by UC permissions. Migration is a separate step — more on that in a future post. For now, expect to work with a mixed environment during any transition period. As always, I'm here to help.

Read more