Databricks SQL Warehouses: Running Analytical Queries on Delta Lake Without Managing a Cluster

All-purpose Databricks clusters are flexible — attach notebooks, run streaming jobs, train models. They're also sized and managed for those mixed workloads. If your use case is "BI analysts running SQL queries against Delta tables," there's a better-suited compute option: SQL Warehouses.

What SQL Warehouses Are

A SQL Warehouse is compute optimized for SQL Analytics. Introduced in late 2020 as "SQL Endpoints," they have a few key characteristics that differ from all-purpose clusters:

  • Auto-suspend: The warehouse automatically suspends after a configurable idle period (as low as 10 minutes). No DBUs accrue while suspended. Resumes in under 30 seconds for most warehouse sizes.
  • Auto-scaling: Scale up by adding clusters (not nodes). A "Small" SQL Warehouse has 1 cluster with 2 nodes. Adding a second cluster doubles the query concurrency capacity.
  • Serverless SQL Warehouses: No underlying cluster to manage at all — Databricks manages the compute, and you pay per statement execution time.
  • BI integration: JDBC/ODBC endpoints let Power BI, Tableau, Looker, and other BI tools connect directly without any Python code.

Sizing

SQL Warehouses use T-shirt sizing: 2X-Small through 4X-Large. Each size doubles the underlying compute resources. The right size depends on your data volume and query complexity:

  • 2X-Small / X-Small: Dashboard queries, small aggregations, lightweight exploratory queries on well-partitioned tables
  • Small / Medium: Moderate analytical queries, tables up to a few hundred GB, moderate concurrency
  • Large / X-Large: Complex queries over TB-scale data, heavy concurrency from many simultaneous BI users

For most BI-facing workloads on well-maintained Delta tables (OPTIMIZE and Z-ORDER applied), a Small warehouse handles a team of 5-10 analysts. Scale up when you see query queue time in the Warehouse monitoring dashboard.

Connecting Power BI

# Get connection details from Warehouse settings:
# Server hostname: adb-XXXXXXXX.azuredatabricks.net
# HTTP path: /sql/1.0/warehouses/WAREHOUSE_ID

# Power BI connection (via Get Data > Azure Databricks):
# Host: adb-XXXXXXXX.azuredatabricks.net
# HTTP Path: /sql/1.0/warehouses/WAREHOUSE_ID
# Authentication: Personal Access Token

From Power BI's perspective, a Databricks SQL Warehouse looks like a standard SQL database. You can browse tables in the Unity Catalog or Hive Metastore, import tables, or write DirectQuery mode queries. DirectQuery on a SQL Warehouse gives you near-real-time data in Power BI with the query executing against Delta Lake on each report refresh — no data copy required.

Photon Engine

SQL Warehouses run on the Photon engine by default — Databricks' native vectorized query execution engine written in C++. For SQL analytical workloads (scans, aggregations, joins on Delta Lake tables), Photon significantly outperforms the JVM-based Spark execution engine that all-purpose clusters use. Benchmarks in 2021-2022 show 2-5x query speedup on typical BI queries.

Photon also handles the Parquet and Delta file formats with native implementations of predicate pushdown, dictionary encoding, and late materialization — optimizations that matter for column-selective SQL analytical queries.

The Cost Model Difference

All-purpose clusters charge DBUs from the moment the cluster starts until it's terminated. SQL Warehouses charge DBUs only when queries are actively executing (Classic warehouses) or per compute-second of query execution (Serverless). For BI workloads that are active during business hours and idle overnight and on weekends, the SQL Warehouse model can be 60-80% cheaper than running an equivalent all-purpose cluster 24/7.

When to Use SQL Warehouses vs. All-Purpose Clusters

SQL Warehouse: BI dashboards, ad-hoc SQL analytics, Power BI / Tableau / Looker integration, scheduled SQL queries, read-heavy analytical workloads, large teams of SQL-fluent analysts who don't write Python.

All-Purpose Cluster: ETL notebooks that mix Python and SQL, ML training, streaming pipelines, complex multi-step transformations, exploratory data science, anything that needs the full PySpark API.

For the full lakehouse architecture, you end up with both: all-purpose clusters for pipeline development and ETL jobs, SQL Warehouses for the BI and analytics access layer on top of the same Delta tables.

Read more