Great Expectations and dbt: Two Data Quality Tools That Are Better Together

If you've adopted dbt for transformation, you've already encountered its built-in tests: not_null, unique, accepted_values, relationships. They're fast to write, they run as part of your dbt build, and for simple constraints they're the right tool. The question I keep hearing is: if dbt has tests, do I still need Great Expectations?

Yes — and they don't overlap as much as they appear to. Here's where the line is.

What dbt Tests Are Good At

dbt tests are SQL-based assertions that run against your transformed models. They excel at:

Schema-level contracts: column must be unique, column must not be null, values must be from an accepted set, foreign key relationships must hold
Fast feedback in the build: dbt test runs alongside dbt run as part of the same workflow
Low-friction setup: four generic tests out of the box, defined in YAML alongside the model

# models/storm_events.yml
models:
  - name: storm_events_silver
    columns:
      - name: event_id
        tests:
          - not_null
          - unique
      - name: event_type
        tests:
          - accepted_values:
              values: ['Tornado', 'Hail', 'Flash Flood', 'Thunderstorm Wind']
      - name: state_fips
        tests:
          - relationships:
              to: ref('dim_states')
              field: fips_code

This covers a lot of ground quickly. For teams new to data testing, dbt tests are the right starting point.

Where dbt Tests Fall Short

dbt tests run against the transformed model output — they don't validate incoming data before transformations run. They can't catch bad data at ingestion; they catch it after the damage is done (the bad data is in your model).

They also don't express statistical constraints well. dbt has no built-in test for "the mean of this column should be between 5 and 15" or "the null rate on this column has increased by more than 5% compared to last week." These require more sophistication than a SQL WHERE clause.

And dbt tests are tightly coupled to your SQL models. For data that enters your pipeline from outside dbt — files, API responses, streaming sources — dbt tests aren't in the picture.

Where Great Expectations Picks Up

GE covers the layers dbt doesn't:

Ingestion gates: validate raw data before it enters your dbt models at all
Statistical expectations: mean, median, standard deviation, proportion checks
Distribution monitoring: detect drift in feature distributions between batches
Non-SQL data sources: files, Spark DataFrames, pandas DataFrames
Cross-batch expectations: comparing this batch to a reference batch

The Combined Architecture

# Pipeline sequence with both tools
1. Raw file arrives → GE validates against raw suite → halt if invalid
2. dbt run builds silver models from validated raw
3. dbt test validates silver model contracts (not_null, unique, accepted_values)
4. GE validates silver output for statistical properties (distributions, drift)
5. dbt run builds gold models
6. dbt test validates gold model contracts
7. GE validates gold outputs are within expected ranges before serving

dbt tests own the structural contracts on SQL model outputs. GE owns the ingestion gates, statistical checks, and non-SQL validation. There's minimal overlap, and the combination is stronger than either alone.

The practical starting point: adopt dbt tests first (they're lower friction), then add GE for the ingestion layer and statistical cases where dbt tests can't reach. As always, I'm here to help.

Great Expectations and dbt: Two Data Quality Tools That Are Better Together

Shannon Lowder

What dbt Tests Are Good At

Where dbt Tests Fall Short

Where Great Expectations Picks Up

The Combined Architecture

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving