Databricks Workflows: What the Jobs UI Became When It Grew Up

Databricks renamed Jobs to Workflows last year, and the UI redesign that came with it reflects something more fundamental: the platform has a real pipeline orchestration story now, not just a job scheduler. If you're still scheduling individual notebooks as separate jobs and managing dependencies through external orchestrators, it's worth looking at what Workflows can do natively.

What Changed from Jobs to Workflows

The old Jobs UI let you schedule a single notebook, JAR, or Python script. Multi-step pipelines required either dbutils.notebook.run() inside a driver notebook, or an external orchestrator like ADF or Airflow. Workflows changes this: you define tasks, declare dependencies between them, and Databricks executes them as a DAG. The dependency resolution, parallel execution, and failure handling are built in.

{
"name": "order_processing_workflow",
"tasks": [
{
"task_key": "ingest_orders",
"description": "Extract raw orders from SQL Server to bronze",
"notebook_task": {
"notebook_path": "/Repos/data-engineering/pipelines/01_ingest_orders",
"base_parameters": {
"processing_date": "{{ds}}",
"source_env": "prod"
}
},
"job_cluster_key": "ingest_cluster"
},
{
"task_key": "ingest_products",
"description": "Extract product catalog — runs in parallel with order ingest",
"notebook_task": {
"notebook_path": "/Repos/data-engineering/pipelines/01_ingest_products"
},
"job_cluster_key": "ingest_cluster"
},
{
"task_key": "transform_silver",
"description": "Clean and conform orders — depends on both ingest steps",
"depends_on": [
{"task_key": "ingest_orders"},
{"task_key": "ingest_products"}
],
"notebook_task": {
"notebook_path": "/Repos/data-engineering/pipelines/02_transform_orders"
},
"job_cluster_key": "transform_cluster"
},
{
"task_key": "build_gold",
"depends_on": [{"task_key": "transform_silver"}],
"notebook_task": {
"notebook_path": "/Repos/data-engineering/pipelines/03_build_gold"
},
"job_cluster_key": "transform_cluster",
"max_retries": 2,
"min_retry_interval_millis": 120000
}
],
"job_clusters": [
{
"job_cluster_key": "ingest_cluster",
"new_cluster": {
"spark_version": "10.4.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"num_workers": 2
}
},
{
"job_cluster_key": "transform_cluster",
"new_cluster": {
"spark_version": "10.4.x-scala2.12",
"node_type_id": "Standard_DS4_v2",
"num_workers": 4
}
}
]
}

Per-Task Cluster Configuration

Notice the two cluster definitions: ingest_cluster for the extraction tasks (smaller, runs in parallel) and transform_cluster for the transformation tasks (larger, handles the joins and aggregations). Each task specifies which cluster it runs on. You're not paying for a large cluster during the ingest phase and a small cluster during the transform phase — you're right-sizing at the task level.

Conditional Execution

{
"task_key": "send_alert",
"depends_on": [
{"task_key": "build_gold", "outcome": "FAILED"}
],
"notebook_task": {
"notebook_path": "/Repos/data-engineering/pipelines/99_send_failure_alert"
},
"job_cluster_key": "ingest_cluster"
}

The outcome field on a dependency lets you build conditional paths. A task that only runs on upstream failure — send an alert, write a failure record, trigger a compensating action — is native to Workflows without any custom conditional logic in your notebooks.

When to Still Use an External Orchestrator

Workflows handles intra-Databricks orchestration well. It doesn't replace ADF or Airflow for cases where your pipeline spans multiple systems: trigger a Workflow after an ADF pipeline completes, fan out to non-Databricks compute, or coordinate across cloud boundaries. Workflows is a strong orchestrator for the Databricks workloads; use a separate orchestrator for the cross-system coordination layer. As always, I'm here to help.

Read more