ADF's Legacy: What It Got Right When Cloud ETL Was New

The Moment Before Everything Was Obvious

It's easy to look at ADF in 2023 — mature, enterprise-grade, absorbed into Microsoft Fabric — and treat its design decisions as obvious. Of course you make everything JSON. Of course you use managed compute. Of course you separate orchestration from execution.

Those decisions were not obvious in 2014. Cloud ETL was not a proven category. The right architecture for cloud-managed data integration was genuinely unknown. ADF made bets. Some of them were correct in ways that only became apparent years later. Some of them aged poorly and required course correction. And one persistent gap — the metadata-driven framework — never got closed regardless of what else improved.

Let me do the accounting while there's still time before Fabric fully absorbs the ADF identity.

What ADF Got Right

JSON-Defined Pipelines

The foundational decision: every ADF object — pipeline, dataset, linked service, trigger, integration runtime — is a JSON document. The authoring UI is a visual editor on top of JSON; the JSON is always accessible, always the source of truth.

This decision was the prerequisite for everything the CI/CD story eventually became. You cannot have proper version control, PR review, or deployment automation for a tool whose configuration is binary or proprietary. JSON made ADF auditable, diffable, and deployable.

In 2014, not every cloud service made this choice. Many did not. ADF's choice here was correct, and it proved durable — the same JSON schema that was introduced in 2014 is recognizable in Fabric Data Factory today.

The Managed Infrastructure Model

ADF was never trying to be a compute platform. The Azure Integration Runtime handles compute automatically. ADF orchestrates; other services compute. Databricks runs the Spark. Azure SQL runs the stored procedures. Azure Functions runs the custom code. ADF calls all of them and manages the dependencies between them.

This delegation model meant that as Azure's service portfolio expanded, ADF's capability expanded proportionally — without ADF having to build the compute capabilities itself. Every new Azure service that shipped with a REST API or a native ADF connector made ADF more powerful without requiring ADF team investment. The orchestrator role became more valuable over time, not less.

The Linked Service / Dataset / Activity Separation

The three-layer model: linked services define connections, datasets define the data structure and location, activities use datasets to move or transform data. Clean separation of concerns. A linked service can be shared across dozens of datasets. A dataset can be shared across multiple pipelines. Parameterize the linked service and the dataset, and you get 100 variations from one definition.

This separation is what makes the metadata-driven framework pattern possible. Without parameterized datasets and linked services, each table requires a separate dataset. With parameterization, one dataset template handles all tables with the same structure.

The Copy Activity

The Copy Activity is the most underappreciated component of ADF. It looks simple — move data from source to sink. Under the hood, it handles parallel reads, partition strategies, schema mapping, type casting, fault tolerance, staging for data warehouses, format conversion, and throughput optimization. It's not just an HTTP GET that writes to Azure Storage. It's a high-throughput, fault-tolerant, schema-aware data mover that handles the cases that break naive implementations.

Every data platform needs a component that "just moves data reliably at scale." ADF built that component and made it the centerpiece. The 100+ connector library is valuable specifically because the Copy Activity is worth using for all of them.

What Aged Poorly

The v1 Slice/Availability Model

ADF v1's scheduling model was conceptually interesting: pipelines operated on time-sliced data windows, and each slice had an availability that governed whether it was ready to be processed. In practice, this model was opaque, unintuitive, and required mental overhead that most users never fully developed. Understanding why a pipeline run wasn't triggered required understanding slice dependency chains that even experienced ADF v1 users regularly got wrong.

The v2 trigger model — schedule trigger, tumbling window trigger, event trigger — replaced this with something transparent. A trigger fires, a pipeline runs. The simplification was the right call.

No Git Integration at Launch

ADF launched in 2014 without git integration. Git integration arrived in 2018 — four years later. Four years of ADF development with no version control, no PR workflow, no audit trail for pipeline changes. For an enterprise data integration tool, this was a significant gap. The teams building serious ADF implementations built their own version control workarounds (export JSON via REST API, commit to git, deploy via ARM) for four years before git integration made it first-party.

The adf_publish Branch Model

When git integration shipped, it came with a design decision that created years of CI/CD friction: the adf_publish branch. The Publish button in ADF Studio generates ARM templates in a separate branch. The CI/CD pipeline deploys those ARM templates. The intermediate artifact — the machine-generated ARM template — is checked into git, creating a noisy branch that no one should read but everyone has to work around.

This was a workaround masquerading as a feature. The Fabric Data Factory git model — direct workspace sync, no intermediate ARM template artifact — is the design that should have shipped in 2018. ADF teams lived with the suboptimal model for five years.

The Net Verdict

ADF made the right foundational bets and the wrong implementation choices on its own tooling. The managed infrastructure model, the JSON-deployable design, the connector-first approach, the Copy Activity — these are durable architectural decisions that age well. The v1 scheduling model, the delayed git integration, the adf_publish branch — these are regrettable choices that the community and the product team spent years working around.

What makes ADF's legacy positive, on balance, is that the right decisions were in the foundation. The wrong decisions were in the operational tooling, which could be worked around and has been replaced. A tool with bad foundations and good tooling is a much harder problem to solve.

ADF had good foundations. The rest was workable.

Read more