2025 in Data Engineering: The Five Patterns That Actually Held Up

Shannon Lowder

23 Dec 2025 — 1 min read

It's been a year of a lot of announcements. Stepping back from the release cadence, here's what I've seen actually prove out in production environments — not in demos, not in conference keynotes, but in the client work and my own infrastructure.

1. The Agent-Augmented Pipeline Is Real and Deployable

LLM-assisted pipeline orchestration — where an agent classifies failures, routes to remediation, and escalates only when it's genuinely uncertain — is working in production. The key insight is narrow scope: agents that do one thing well (triage pipeline failures) outperform agents that try to do everything (manage the entire data platform). Start narrow, earn trust, expand scope.

2. Unity Catalog Is Now the Right Default

If you're starting a new Databricks environment, the question of whether to use Unity Catalog is no longer open. The governance, lineage, and cross-workspace sharing capabilities have matured to the point where the cost of not using it (in future migration effort) exceeds the cost of the early adoption friction. The teams that adopted UC early are now the ones with a governance foundation that makes AI features actually work.

3. Open-Weight Models Are Production-Viable

Llama 4 and DeepSeek V3.2 running in production for classification and extraction tasks is now a legitimate alternative to OpenAI and Anthropic API calls for cost-sensitive, high-volume workloads. The operational overhead is real, but so is the cost savings for the right workload profile.

4. MCP Is the Right Abstraction for Tool Integration

Building tool integrations against MCP servers rather than framework-specific tool definitions has paid off. Write once, use across Claude, LangGraph, Copilot Studio, and whatever framework ships next year.

5. The Data Quality Foundation Still Matters Most

Every AI-augmented pipeline I've seen succeed in production was built on top of solid data quality infrastructure. Every one I've seen struggle had data quality debt that the AI couldn't compensate for. The models are good. Good data is still irreplaceable. As always, I'm here to help.

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

I wrote recently about Azure Agent Mesh and OpenSharing — two infrastructure layers that between them cover how enterprises register, discover, share, and execute agents. Between them, they address a lot of the plumbing that has been missing from the enterprise agent stack. But there's a gap neither of

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

Unity AI Gateway, announced at DAIS this week, is the feature I've been waiting for since Agent Bricks shipped last year. It's a centralized governance layer for model access in Databricks — you configure which models are approved for use in your environment, who can call them,

You Don't Need Fable. You Need a Router.

The performance gap between open-weight models and closed frontier models has spent the last year collapsing faster than anyone predicted. Epoch AI's tracking puts open weights at roughly a three-to-four-month lag behind state-of-the-art closed models on average. For coding tasks, the gap has effectively closed — DeepSeek V3.2

DAIS 2026: Genie One and the Context Problem Databricks Is Solving

The central message from DAIS this week, delivered by Ali Ghodsi in the opening keynote, was direct: AI doesn't have an intelligence problem, it has a context problem. If your CFO can't get an AI system to explain why margins changed, that's not a