Eight Years of Azure Data Factory: The Honest Accounting

Eight Years

I started working with Azure Data Factory in 2014. It was in preview. The authoring experience was JSON in a browser. There was no version control. There was no parameterization. There was no CI/CD story. There was a connector list short enough to print on one page.

In 2022, ADF is an enterprise-grade orchestration platform with 100+ connectors, full parameterization, git integration, automated CI/CD, Spark-backed transformation, managed network security, and an integration with Apache Airflow. The gap between 2014 ADF and 2022 ADF is enormous.

Eight years is a good length of time for an honest accounting.

What Microsoft Got Right

The Managed Infrastructure Model

The foundational architectural decision — no servers, managed compute, delegate to external services rather than build everything in-house — was correct from day one. ADF was never trying to be a compute platform. It was always an orchestration layer that called other services to do the heavy lifting. That decision proved durable. As Azure's service portfolio expanded (Databricks, Synapse, Azure Functions), ADF absorbed them as activity types. The orchestrator role became more valuable, not less.

The JSON-Deployable Design

ADF's pipeline definitions are JSON. All of them. Every pipeline, dataset, linked service, trigger, and integration runtime is a JSON document. This design decision, which seemed obvious in retrospect, was the foundation of everything the CI/CD story eventually became. You can't have a proper deployment pipeline for a tool that stores its configuration as proprietary binary. JSON made ADF deployable.

The Connector-First Approach

Building a comprehensive connector library — rather than a more limited set of "officially supported" connectors with a plugin architecture for the rest — meant that ADF earned trust with enterprise integration teams who had diverse source landscapes. The connector library became a moat. It's table stakes now, but it wasn't always.

The v2 Redesign

ADF v1's slice/availability scheduling model was clever but opaque. Understanding why a pipeline run was or wasn't scheduled required understanding the window system, which required reading documentation that still left most people confused. The v2 trigger model — schedule trigger, tumbling window trigger, event trigger — is transparent and predictable. v2 also introduced parameterization, which made everything else possible. The redesign was disruptive (some v1 shops resisted the migration), but it was the right call. v2 is the product that earned enterprise adoption at scale.

What Microsoft Never Closed

The Metadata-Driven Framework Gap

Eight years. The metadata-driven pattern — control table, ForEach loop, parameterized pipeline template — is the standard approach for any ADF implementation that needs to scale beyond a handful of pipelines. Microsoft has never shipped first-party documentation, templates, or tooling for this pattern. It's entirely community-built and community-maintained. Every ADF shop that uses this pattern has reinvented it, with variations in schema design, parameter naming conventions, and error handling. This is a gap that Microsoft could close with a single well-documented reference implementation, and they've chosen not to for eight years.

The Automated Publish Gap

The manual Publish button in ADF Studio was the wrong design decision, and it took until 2022 to have a fully supported automated alternative. The npm package solution works, but the gap from 2018 (git integration GA) to 2022 (automated publish npm package mature and well-documented) is four years of CI/CD pain that shouldn't have happened. The Publish button should have been automatable from the day git integration shipped.

The Monitoring UX

ADF Monitor shows you pipeline run history. It doesn't alert you when pipelines fail. Configuring meaningful alerting requires Azure Monitor, custom metrics, alert rules, and action groups — a separate configuration surface that most ADF developers don't fully understand, resulting in production pipeline failures that no one notices for hours. This is still true in 2022. Alerting should be first-class in ADF Studio, not an external configuration requirement.

The 80% Pattern

Every year, I've said the same thing: ADF gets you 80% of the way to a complete enterprise data integration platform. The last 20% — metadata-driven framework, CI/CD automation, monitoring alerting — is filled by the community, by custom tooling, by individual shops building what Microsoft didn't.

In 2022, the remaining gap is smaller. The CI/CD story is now 95% solved. The connector coverage is 100% for practical purposes. Data Flows add Spark transformation without the Databricks prerequisite. The 80% has become 90%.

But the last 10% — first-party metadata framework templates, native monitoring alerting — is still open. And I'm now watching to see whether it closes in Fabric, or whether the 80-90% pattern repeats for another decade.

My bet: it repeats. But I've been wrong about Microsoft before.

Here's to year nine. As always, I'm here to help you close the gap that Microsoft left open.

Read more