Unity Catalog: External Locations and Storage Credentials Explained

After creating a Unity Catalog metastore, the first practical question is: how do you give it access to your data that already lives in cloud storage? The answer involves two concepts that work together — storage credentials and external locations — and getting them right is the foundation for everything else in UC.

Storage Credentials

A storage credential is a reference to a cloud identity that Databricks can use to access storage on your behalf. On Azure, this is a managed identity or a service principal. On AWS, it's an IAM role. The credential itself doesn't contain the storage path — it's just the authentication mechanism.

Storage credentials are created at the account level and shared across workspaces that are attached to the same metastore.

-- Create a storage credential (Azure: using managed identity)
-- The managed identity must have Storage Blob Data Contributor on the storage account
CREATE STORAGE CREDENTIAL prod_adls_credential
WITH AZURE MANAGED IDENTITY (DIRECTORY_ID = 'your-tenant-id',
MANAGED_IDENTITY_ID = 'your-managed-identity-object-id')
COMMENT 'Managed identity for production ADLS Gen2 access';

-- Verify the credential works
TEST STORAGE CREDENTIAL prod_adls_credential
ON WRITE TO 'abfss://[email protected]/test/';

External Locations

An external location combines a storage credential with a specific path. It says: "when code in this metastore accesses this URL, use this credential." External locations are how Unity Catalog maps abstract storage paths to concrete cloud access.

-- Create external locations for each zone
CREATE EXTERNAL LOCATION bronze_zone
URL 'abfss://[email protected]/'
WITH (STORAGE CREDENTIAL prod_adls_credential)
COMMENT 'Bronze zone — raw ingested data';

CREATE EXTERNAL LOCATION silver_zone
URL 'abfss://[email protected]/'
WITH (STORAGE CREDENTIAL prod_adls_credential)
COMMENT 'Silver zone — cleaned and conformed data';

CREATE EXTERNAL LOCATION gold_zone
URL 'abfss://[email protected]/'
WITH (STORAGE CREDENTIAL prod_adls_credential)
COMMENT 'Gold zone — curated, ready for consumption';

Granting Access to External Locations

-- Grant access to specific principals
GRANT READ FILES ON EXTERNAL LOCATION bronze_zone TO `[email protected]`;
GRANT WRITE FILES ON EXTERNAL LOCATION silver_zone TO `[email protected]`;

-- For service principals running automated pipelines
GRANT READ FILES ON EXTERNAL LOCATION bronze_zone TO `databricks-pipeline-sp`;
GRANT WRITE FILES ON EXTERNAL LOCATION silver_zone TO `databricks-pipeline-sp`;
GRANT WRITE FILES ON EXTERNAL LOCATION gold_zone TO `databricks-pipeline-sp`;

External Tables vs Managed Tables

Once external locations are set up, you can create both external tables and managed tables in Unity Catalog:

-- External table: data lives at a path you control
-- Dropping the table does NOT delete the data
CREATE TABLE prod_analytics.silver.customer_orders
LOCATION 'abfss://[email protected]/customer_orders/'
USING DELTA;

-- Managed table: data lives in the metastore's managed storage
-- Dropping the table DOES delete the data
CREATE TABLE prod_analytics.silver.customer_segments
USING DELTA;

The Gotcha: Path Overlap

External locations must not overlap. If you create an external location for abfss://[email protected]/ (the root) and then try to create another for abfss://[email protected]/bronze/ (a subfolder), the second one will fail because it falls within the first. Plan your external location hierarchy before creating them — it's much easier to design it right the first time than to reorganize it later. As always, I'm here to help.

Read more