Photon in Production: Six Months of Real Query Performance Data

Shannon Lowder

24 Jun 2022 — 2 min read

Photon launched on DBR 9.1 last November. I've been running it in production on three different workloads since then — a daily batch aggregation pipeline, a large MERGE-based SCD2 dimension load, and an ad-hoc analytical query layer that analysts run throughout the day. Six months of production data gives a clearer picture than the initial benchmarks.

The Batch Aggregation Pipeline

This is where Photon performs best. The daily order aggregation scans roughly 900 million rows and produces a regional summary. Before Photon on DBR 9.1: average runtime 4 minutes 22 seconds over a 30-day period. After migration to a Photon-enabled DBR 10.4 cluster (same instance type, same node count): average runtime 1 minute 54 seconds. Roughly 2.3x improvement, consistent across the measurement period.

The improvement is almost entirely in the scan and GROUP BY phase. The query doesn't use Python UDFs or anything that would fall through to standard Spark execution.

The SCD2 Dimension Load

More mixed results here. The dimension table has about 40 million rows. The SCD2 MERGE involves reading the full current dimension, joining to the incoming delta, detecting changes, closing out changed rows, and inserting new current rows.

Standard DBR: 8 minutes 15 seconds average. Photon DBR: 6 minutes 40 seconds. About 19% improvement — real, but not the 2x+ you'd expect from the scan-heavy operations. The MERGE itself has some Python-adjacent overhead in the Delta merge execution path that isn't fully vectorized in the current Photon implementation.

The Analyst Query Layer

This is the most variable workload — queries range from simple lookups to multi-table joins and window functions, run interactively, with wildly different data volumes. The aggregate experience: queries that do large scans and GROUP BYs improved 1.5-2.5x. Single-row lookups and small result set queries — no meaningful difference.

-- This type of query: significant Photon benefit
SELECT
product_category,
region_code,
YEAR(order_date) AS order_year,
MONTH(order_date) AS order_month,
SUM(order_amount) AS revenue,
COUNT(*) AS order_count,
COUNT(DISTINCT customer_id) AS unique_customers,
PERCENTILE_APPROX(order_amount, 0.95) AS p95_order_amount
FROM prod_analytics.gold.order_line_items
WHERE order_date >= '2021-01-01'
GROUP BY product_category, region_code, order_year, order_month;

-- This type: minimal Photon benefit
SELECT * FROM prod_analytics.gold.customers WHERE customer_id = 'CUST-00042';

The Cost Math

On Azure, Photon-enabled DBR nodes cost approximately 2x standard DBR nodes in DBU terms (the exact multiplier varies by instance type and agreement — check your pricing). For the aggregation pipeline, Photon cuts runtime by 57%, which more than offsets the 2x DBU cost on the same cluster. Net compute cost: down about 28%.

For the SCD2 dimension load (19% runtime improvement, 2x DBU cost), Photon is a net cost increase. I kept it on the same cluster because the other workloads that share that cluster are scan-heavy and benefit overall — but if the SCD2 were on a dedicated cluster, I'd run it on standard DBR.

The Rule of Thumb

Photon helps when the bottleneck is large-scale scan, aggregation, or join performance. It doesn't help when the bottleneck is Python UDFs, small data, single-row lookups, or write-heavy merge operations. Benchmark on your specific workloads before assuming the 2x marketing claim applies. As always, I'm here to help.

Photon in Production: Six Months of Real Query Performance Data

Shannon Lowder

The Batch Aggregation Pipeline

The SCD2 Dimension Load

The Analyst Query Layer

The Cost Math

The Rule of Thumb

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving