Unity Catalog: External Locations and Storage Credentials Explained
After creating a Unity Catalog metastore, the first practical question is: how do you give it access to your data that already lives in cloud storage? The answer involves two concepts that work together — storage credentials and external locations — and getting them right is the foundation for everything else in UC.
Storage Credentials
A storage credential is a reference to a cloud identity that Databricks can use to access storage on your behalf. On Azure, this is a managed identity or a service principal. On AWS, it's an IAM role. The credential itself doesn't contain the storage path — it's just the authentication mechanism.
Storage credentials are created at the account level and shared across workspaces that are attached to the same metastore.
-- Create a storage credential (Azure: using managed identity)
-- The managed identity must have Storage Blob Data Contributor on the storage account
CREATE STORAGE CREDENTIAL prod_adls_credential
WITH AZURE MANAGED IDENTITY (DIRECTORY_ID = 'your-tenant-id',
MANAGED_IDENTITY_ID = 'your-managed-identity-object-id')
COMMENT 'Managed identity for production ADLS Gen2 access';
-- Verify the credential works
TEST STORAGE CREDENTIAL prod_adls_credential
ON WRITE TO 'abfss://[email protected]/test/';
External Locations
An external location combines a storage credential with a specific path. It says: "when code in this metastore accesses this URL, use this credential." External locations are how Unity Catalog maps abstract storage paths to concrete cloud access.
-- Create external locations for each zone
CREATE EXTERNAL LOCATION bronze_zone
URL 'abfss://[email protected]/'
WITH (STORAGE CREDENTIAL prod_adls_credential)
COMMENT 'Bronze zone — raw ingested data';
CREATE EXTERNAL LOCATION silver_zone
URL 'abfss://[email protected]/'
WITH (STORAGE CREDENTIAL prod_adls_credential)
COMMENT 'Silver zone — cleaned and conformed data';
CREATE EXTERNAL LOCATION gold_zone
URL 'abfss://[email protected]/'
WITH (STORAGE CREDENTIAL prod_adls_credential)
COMMENT 'Gold zone — curated, ready for consumption';
Granting Access to External Locations
-- Grant access to specific principals
GRANT READ FILES ON EXTERNAL LOCATION bronze_zone TO `[email protected]`;
GRANT WRITE FILES ON EXTERNAL LOCATION silver_zone TO `[email protected]`;
-- For service principals running automated pipelines
GRANT READ FILES ON EXTERNAL LOCATION bronze_zone TO `databricks-pipeline-sp`;
GRANT WRITE FILES ON EXTERNAL LOCATION silver_zone TO `databricks-pipeline-sp`;
GRANT WRITE FILES ON EXTERNAL LOCATION gold_zone TO `databricks-pipeline-sp`;
External Tables vs Managed Tables
Once external locations are set up, you can create both external tables and managed tables in Unity Catalog:
-- External table: data lives at a path you control
-- Dropping the table does NOT delete the data
CREATE TABLE prod_analytics.silver.customer_orders
LOCATION 'abfss://[email protected]/customer_orders/'
USING DELTA;
-- Managed table: data lives in the metastore's managed storage
-- Dropping the table DOES delete the data
CREATE TABLE prod_analytics.silver.customer_segments
USING DELTA;
The Gotcha: Path Overlap
External locations must not overlap. If you create an external location for abfss://[email protected]/ (the root) and then try to create another for abfss://[email protected]/bronze/ (a subfolder), the second one will fail because it falls within the first. Plan your external location hierarchy before creating them — it's much easier to design it right the first time than to reorganize it later. As always, I'm here to help.