Delta Live Tables: First Look at Databricks' Declarative Pipeline Framework
Delta Live Tables has been in preview for a few months and I've been watching it closely. The concept is appealing: instead of writing notebook code that explicitly manages the read-transform-write lifecycle, you declare what the data should look like at each stage and DLT figures out the execution order, dependencies, and lineage automatically. It's the declarative approach applied to data pipelines.
Here's what it looks like in practice, what I like, and where I still have questions.
What DLT Actually Is
Delta Live Tables is a framework that runs inside Databricks. You write Python or SQL notebooks using DLT decorators, define your tables as decorated functions, and DLT builds a DAG from the dependencies between those functions. It handles execution order, retries, monitoring, and lineage tracking automatically.
import dlt
from pyspark.sql.functions import col, to_date
@dlt.table(
name="orders_bronze",
comment="Raw order records from ADLS",
table_properties={"delta.autoOptimize.optimizeWrite": "true"}
)
def orders_bronze():
return (
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "json")
.load("/mnt/raw/orders/")
)
@dlt.table(
name="orders_silver",
comment="Cleaned and conformed order records"
)
@dlt.expect_or_drop("valid_order_id", "order_id IS NOT NULL")
@dlt.expect_or_drop("positive_amount", "order_amount > 0")
def orders_silver():
return (
dlt.read_stream("orders_bronze")
.withColumn("order_date", to_date(col("order_date_str"), "yyyy-MM-dd"))
.withColumn("order_amount", col("order_amount").cast("decimal(18,2)"))
.drop("order_date_str")
)
@dlt.table(name="orders_gold_daily_summary")
def orders_gold_daily_summary():
return (
dlt.read("orders_silver")
.groupBy("order_date", "region_code")
.agg({"order_amount": "sum", "order_id": "count"})
)
What I Like
The expectation decorators are genuinely useful. @dlt.expect_or_drop removes records that fail validation before they get written, and it logs the failure rate automatically. You can also use @dlt.expect_or_fail (halt the pipeline on violations) or @dlt.expect (log violations but continue). That's three levels of data quality enforcement with one line of code per rule.
Lineage is free. DLT tracks which tables feed which tables and surfaces it in the pipeline DAG view in the Databricks UI. For audit and debugging purposes, that visibility has immediate value without any extra instrumentation.
What I'm Still Working Through
DLT runs in its own compute environment — a DLT cluster, not your standard interactive or job cluster. Configuration of that cluster is less flexible than a standard cluster config. There are currently constraints on which Databricks features work in a DLT pipeline (some libraries, some Spark config settings), and the documentation on what's supported isn't always complete.
Testing DLT pipelines locally is harder than testing standard notebook code. The DLT framework isn't available outside a running pipeline context, so unit testing the transformation logic requires some structural gymnastics to isolate the functions from the decorator framework.
The Migration Question
If you have existing medallion architecture notebooks that work — should you migrate them to DLT? For new pipelines, I'd start with DLT and evaluate. For existing pipelines that are stable and well-understood, the migration cost probably doesn't pay off unless you specifically need the built-in data quality tracking or the lineage visibility. It's still in preview; I'd let it bake a bit more before retrofitting production pipelines. As always, I'm here to help.