If you've adopted dbt for transformation, you've already encountered its built-in tests: not_null, unique, accepted_values, relationships. They're fast to write, they run as part of your dbt build, and for simple constraints they're the right tool. The question I keep hearing is: if dbt has tests, do I still need Great Expectations?
Yes — and they don't overlap as much as they appear to. Here's where the line is.
What dbt Tests Are Good At
dbt tests are SQL-based assertions that run against your transformed models. They excel at:
- Schema-level contracts: column must be unique, column must not be null, values must be from an accepted set, foreign key relationships must hold
- Fast feedback in the build:
dbt testruns alongsidedbt runas part of the same workflow - Low-friction setup: four generic tests out of the box, defined in YAML alongside the model
# models/storm_events.yml
models:
- name: storm_events_silver
columns:
- name: event_id
tests:
- not_null
- unique
- name: event_type
tests:
- accepted_values:
values: ['Tornado', 'Hail', 'Flash Flood', 'Thunderstorm Wind']
- name: state_fips
tests:
- relationships:
to: ref('dim_states')
field: fips_code
This covers a lot of ground quickly. For teams new to data testing, dbt tests are the right starting point.
Where dbt Tests Fall Short
dbt tests run against the transformed model output — they don't validate incoming data before transformations run. They can't catch bad data at ingestion; they catch it after the damage is done (the bad data is in your model).
They also don't express statistical constraints well. dbt has no built-in test for "the mean of this column should be between 5 and 15" or "the null rate on this column has increased by more than 5% compared to last week." These require more sophistication than a SQL WHERE clause.
And dbt tests are tightly coupled to your SQL models. For data that enters your pipeline from outside dbt — files, API responses, streaming sources — dbt tests aren't in the picture.
Where Great Expectations Picks Up
GE covers the layers dbt doesn't:
- Ingestion gates: validate raw data before it enters your dbt models at all
- Statistical expectations: mean, median, standard deviation, proportion checks
- Distribution monitoring: detect drift in feature distributions between batches
- Non-SQL data sources: files, Spark DataFrames, pandas DataFrames
- Cross-batch expectations: comparing this batch to a reference batch
The Combined Architecture
# Pipeline sequence with both tools
1. Raw file arrives → GE validates against raw suite → halt if invalid
2. dbt run builds silver models from validated raw
3. dbt test validates silver model contracts (not_null, unique, accepted_values)
4. GE validates silver output for statistical properties (distributions, drift)
5. dbt run builds gold models
6. dbt test validates gold model contracts
7. GE validates gold outputs are within expected ranges before serving
dbt tests own the structural contracts on SQL model outputs. GE owns the ingestion gates, statistical checks, and non-SQL validation. There's minimal overlap, and the combination is stronger than either alone.
The practical starting point: adopt dbt tests first (they're lower friction), then add GE for the ingestion layer and statistical cases where dbt tests can't reach. As always, I'm here to help.