All-purpose Databricks clusters are flexible — attach notebooks, run streaming jobs, train models. They're also sized and managed for those mixed workloads. If your use case is "BI analysts running SQL queries against Delta tables," there's a better-suited compute option: SQL Warehouses.
What SQL Warehouses Are
A SQL Warehouse is compute optimized for SQL Analytics. Introduced in late 2020 as "SQL Endpoints," they have a few key characteristics that differ from all-purpose clusters:
- Auto-suspend: The warehouse automatically suspends after a configurable idle period (as low as 10 minutes). No DBUs accrue while suspended. Resumes in under 30 seconds for most warehouse sizes.
- Auto-scaling: Scale up by adding clusters (not nodes). A "Small" SQL Warehouse has 1 cluster with 2 nodes. Adding a second cluster doubles the query concurrency capacity.
- Serverless SQL Warehouses: No underlying cluster to manage at all — Databricks manages the compute, and you pay per statement execution time.
- BI integration: JDBC/ODBC endpoints let Power BI, Tableau, Looker, and other BI tools connect directly without any Python code.
Sizing
SQL Warehouses use T-shirt sizing: 2X-Small through 4X-Large. Each size doubles the underlying compute resources. The right size depends on your data volume and query complexity:
- 2X-Small / X-Small: Dashboard queries, small aggregations, lightweight exploratory queries on well-partitioned tables
- Small / Medium: Moderate analytical queries, tables up to a few hundred GB, moderate concurrency
- Large / X-Large: Complex queries over TB-scale data, heavy concurrency from many simultaneous BI users
For most BI-facing workloads on well-maintained Delta tables (OPTIMIZE and Z-ORDER applied), a Small warehouse handles a team of 5-10 analysts. Scale up when you see query queue time in the Warehouse monitoring dashboard.
Connecting Power BI
# Get connection details from Warehouse settings:
# Server hostname: adb-XXXXXXXX.azuredatabricks.net
# HTTP path: /sql/1.0/warehouses/WAREHOUSE_ID
# Power BI connection (via Get Data > Azure Databricks):
# Host: adb-XXXXXXXX.azuredatabricks.net
# HTTP Path: /sql/1.0/warehouses/WAREHOUSE_ID
# Authentication: Personal Access Token
From Power BI's perspective, a Databricks SQL Warehouse looks like a standard SQL database. You can browse tables in the Unity Catalog or Hive Metastore, import tables, or write DirectQuery mode queries. DirectQuery on a SQL Warehouse gives you near-real-time data in Power BI with the query executing against Delta Lake on each report refresh — no data copy required.
Photon Engine
SQL Warehouses run on the Photon engine by default — Databricks' native vectorized query execution engine written in C++. For SQL analytical workloads (scans, aggregations, joins on Delta Lake tables), Photon significantly outperforms the JVM-based Spark execution engine that all-purpose clusters use. Benchmarks in 2021-2022 show 2-5x query speedup on typical BI queries.
Photon also handles the Parquet and Delta file formats with native implementations of predicate pushdown, dictionary encoding, and late materialization — optimizations that matter for column-selective SQL analytical queries.
The Cost Model Difference
All-purpose clusters charge DBUs from the moment the cluster starts until it's terminated. SQL Warehouses charge DBUs only when queries are actively executing (Classic warehouses) or per compute-second of query execution (Serverless). For BI workloads that are active during business hours and idle overnight and on weekends, the SQL Warehouse model can be 60-80% cheaper than running an equivalent all-purpose cluster 24/7.
When to Use SQL Warehouses vs. All-Purpose Clusters
SQL Warehouse: BI dashboards, ad-hoc SQL analytics, Power BI / Tableau / Looker integration, scheduled SQL queries, read-heavy analytical workloads, large teams of SQL-fluent analysts who don't write Python.
All-Purpose Cluster: ETL notebooks that mix Python and SQL, ML training, streaming pipelines, complex multi-step transformations, exploratory data science, anything that needs the full PySpark API.
For the full lakehouse architecture, you end up with both: all-purpose clusters for pipeline development and ETL jobs, SQL Warehouses for the BI and analytics access layer on top of the same Delta tables.