Auto-Generating Terraform from a Metadata Config Table

Shannon Lowder

25 Jun 2021 — 2 min read

The metadata-driven pipeline generator was producing ADF pipelines and Databricks notebooks from a config table. The missing piece was infrastructure: every new client data source also needed a storage container, a Databricks cluster configuration, a linked service, and sometimes a Key Vault secret reference. Those were still being created by hand. Terraform templates, mostly duplicated across projects, occasionally diverging in ways that caused subtle environment differences.

The fix was to close the loop: if metadata drives ADF and Databricks, it should also drive Terraform.

The Config Table Extension

-- Extend the existing IngestionConfig table or use a separate InfraConfig table
CREATE TABLE [meta].[InfraConfig] (
    InfraID            INT IDENTITY(1,1) PRIMARY KEY,
    SourceID           INT NOT NULL REFERENCES [meta].[IngestionConfig](SourceID),
    StorageAccountName NVARCHAR(100) NOT NULL,
    ContainerName      NVARCHAR(100) NOT NULL,
    ClusterSizeLabel   NVARCHAR(50)  NOT NULL,  -- SMALL | MEDIUM | LARGE
    RequiresKeyVault   BIT           NOT NULL DEFAULT 0,
    SecretName         NVARCHAR(200) NULL,       -- name within KV
    Tags               NVARCHAR(MAX) NULL        -- JSON object: {env, owner, cost_center}
);

The Terraform Generator

import json
from pathlib import Path

CLUSTER_SIZE_MAP = {
'SMALL': {'node_type': 'Standard_DS3_v2', 'workers': 2},
'MEDIUM': {'node_type': 'Standard_DS4_v2', 'workers': 4},
'LARGE': {'node_type': 'Standard_DS5_v2', 'workers': 8},
}

def render_storage_container(config: dict) -> str:
tags = json.loads(config.get('Tags') or '{}')
tags_tf = "\n ".join(f'{k} = "{v}"' for k, v in tags.items())
return f"""
resource "azurerm_storage_container" "{config['ContainerName']}" {{
name = "{config['ContainerName']}"
storage_account_name = "{config['StorageAccountName']}"
container_access_type = "private"
}}
"""

def render_databricks_cluster(config: dict, source_name: str) -> str:
size = CLUSTER_SIZE_MAP[config['ClusterSizeLabel']]
return f"""
resource "databricks_cluster" "{source_name.lower()}_cluster" {{
cluster_name = "{source_name}-ingest"
spark_version = "10.4.x-scala2.12"
node_type_id = "{size['node_type']}"
autotermination_minutes = 30
autoscale {{
min_workers = 1
max_workers = {size['workers']}
}}
}}
"""

def render_keyvault_reference(config: dict) -> str:
if not config['RequiresKeyVault']:
return ""
return f"""
resource "azurerm_key_vault_secret" "{config['SecretName']}_ref" {{
name = "{config['SecretName']}"
value = var.{config['SecretName'].replace('-', '_')}_value
key_vault_id = azurerm_key_vault.pipeline_kv.id
}}
"""

def generate_terraform_module(configs: list[dict], output_dir: str) -> None:
Path(output_dir).mkdir(parents=True, exist_ok=True)
blocks = []
for config in configs:
source_name = config['SourceName']
blocks.append(render_storage_container(config))
blocks.append(render_databricks_cluster(config, source_name))
blocks.append(render_keyvault_reference(config))
tf_content = "\n".join(blocks)
with open(f"{output_dir}/generated_sources.tf", "w") as f:
f.write(tf_content)

Integration with the CI/CD Pipeline

The generator runs as part of the pipeline that deploys new data sources. The sequence: new row in meta.IngestionConfig + meta.InfraConfig → CI job runs the generator → generated .tf files committed to the infra repo → Terraform plan reviewed → Terraform apply provisions the infrastructure → ADF and Databricks notebook generators run → deployment complete.

No more handcrafted Terraform per source. No more infrastructure drift because someone copied last project's config and changed three of the five things that needed changing.

The Gotcha: Generated Files and Code Review

Generated Terraform files need to live in source control, but they complicate code review — a config table change that adds five sources generates five hundred lines of Terraform. The pattern that works: keep generated files in a generated/ subdirectory, note in the PR that these were auto-generated from a config change, and review the config change rather than the generated output. Trust the generator; review the config. As always, I'm here to help.

Auto-Generating Terraform from a Metadata Config Table

Shannon Lowder

The Config Table Extension

The Terraform Generator

Integration with the CI/CD Pipeline

The Gotcha: Generated Files and Code Review

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving