Data Contracts and Great Expectations: Enforcing Agreements Between Producers and Consumers

Data contracts have become a hot topic in the data engineering community over the last couple of years. The idea isn't new — the concept of a formal agreement between a data producer and a data consumer about the shape, semantics, and quality of data has been implicit in every ETL integration ever built. What's new is making those agreements explicit, versioned, and machine-enforceable rather than living in a wiki page and two engineers' shared memory.

Great Expectations expectation suites are, in practice, data contracts. Here's how to use them that way deliberately.

What a Data Contract Actually Is

A data contract specifies:

Schema: which columns, what types, what's nullable
Semantics: what values are valid, what ranges are expected, what relationships must hold
SLAs: timeliness, freshness, row count expectations
Ownership: who produces it, who consumes it, who to contact when it breaks

An expectation suite naturally covers the first two. It can partially cover the third (row count expectations, freshness checks via timestamp columns). The fourth requires metadata you manage separately.

Structuring Suites as Formal Contracts

The meta field in GE expectation suites and individual expectations carries arbitrary JSON. Use it to embed contract metadata:

{
  "expectation_suite_name": "storm_events_silver",
  "meta": {
    "contract_version": "2.1.0",
    "data_asset": "storm.silver_events",
    "producer": "storm-ingestion-pipeline",
    "consumers": ["model-training-pipeline", "storm-analytics-dashboard"],
    "owner": "[email protected]",
    "sla": {
      "freshness_hours": 24,
      "min_daily_row_count": 5000
    },
    "changelog": [
      {"version": "2.1.0", "date": "2023-08-01", "change": "Added magnitude validation after Q2 data issues"},
      {"version": "2.0.0", "date": "2023-01-15", "change": "Added event_type set constraint — breaking change"},
      {"version": "1.0.0", "date": "2021-03-08", "change": "Initial contract"}
    ]
  },
  "expectations": [...]
}

Consumer-Driven Contract Testing

The most powerful contract pattern in software engineering is consumer-driven: the consumer defines the minimum contract it needs, the producer validates that its output meets that contract. Same principle applies to data.

If your ML training pipeline only needs event_id, event_date, magnitude, and state, define a consumer suite that asserts exactly those columns and their constraints. The producer runs both its own suite (full output contract) and the consumer suite (does the consumer's minimum requirement hold?) before delivery:

def deliver_to_training_pipeline(silver_df):
    # Producer suite: full output contract
    producer_result = run_checkpoint("storm_silver_producer", silver_df)
    if not producer_result.success:
        raise RuntimeError("Producer suite failed — do not deliver")

    # Consumer suite: minimum contract for training pipeline
    consumer_result = run_checkpoint("storm_silver_for_training", silver_df)
    if not consumer_result.success:
        raise RuntimeError("Consumer contract not met — coordinate with training team")

    write_to_training_feature_store(silver_df)

Breaking vs. Non-Breaking Contract Changes

Apply semantic versioning to your expectation suites:

Patch (1.0.x): tightening a range, adding a mostly threshold, fixing a typo in meta
Minor (1.x.0): adding new expectations (consumers need to verify they still pass)
Major (x.0.0): removing columns, renaming columns, changing types, removing expectations that consumers depended on

Major contract changes require explicit coordination with consumers. The version in the suite meta makes the change type visible in the Git diff. A major version bump in a PR review is a signal that downstream teams need to be notified before merge.

Making data contracts explicit and versioned doesn't eliminate the coordination overhead of schema changes — but it makes that coordination happen deliberately and in advance, rather than reactively when something breaks. As always, I'm here to help.

Data Contracts and Great Expectations: Enforcing Agreements Between Producers and Consumers

Shannon Lowder

What a Data Contract Actually Is

Structuring Suites as Formal Contracts

Consumer-Driven Contract Testing

Breaking vs. Non-Breaking Contract Changes

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving