ADF in 2021: Mature Platform, Persistent Gaps

Seven years since ADF was in preview. Time for an honest inventory.

I've been running ADF in production for the better part of that timeline. The platform has changed substantially. My view of it has changed substantially. Here's where things actually stand in 2021 -- what's solved, what's managed, and what's still genuinely broken.

What's Fully Solved

Managed Infrastructure

I have not managed an ADF server in seven years. Zero capacity planning. Zero patching. Zero "the ETL server is down" incidents. For organizations that remember babysitting SSIS servers or Informatica appliances, this is not a small thing -- it's the fundamental value proposition of the platform. It delivers.

Connector Breadth

90+ connectors covering every Azure service, all major relational databases, the major SaaS platforms (Salesforce, Dynamics, SAP, ServiceNow), and all relevant file formats including Delta Lake. The connector gap story from ADF's early years is a historical artifact. The conversation has moved to design patterns, not connectivity.

CI/CD

The ARM template CI/CD pattern is well-understood. Git integration on Dev, collaboration branch, ARM template generation on Publish, environment-specific parameter files, pre/post deployment trigger management scripts. This pattern is stable and I've built it for a dozen clients. Teams that adopt it have repeatable, auditable deployments.

Parameterized Pipelines and Metadata-Driven Frameworks

The ForEach plus parameterized pipeline pattern enables metadata-driven architectures where adding a new source table takes minutes rather than days. This is community knowledge rather than a first-party Microsoft pattern, but it's so well-understood that it's effectively the standard approach for any serious ADF deployment.

Network Security

Managed Virtual Network and Managed Private Endpoints closed the last major enterprise security objection. ADF can now reach private Azure resources without SHIR VMs for Azure-to-Azure connectivity. This was a genuine gap through 2019 that is now solved.

Data Flows for Spark Transformation

Mapping Data Flows in 2021 are production-grade. The Spark execution model scales, the cluster TTL eliminates cold starts for frequent pipelines, the partition tuning levers are effective, and the transformation set covers complex scenarios including aggregation, joins, lookup, ranking, and conditional logic. I'm running Data Flows on 50M+ row datasets in production.

What's Managed (Rough Edges, Not Blockers)

The Monitoring Story

ADF's built-in Monitor tab shows pipeline run history and lets you drill into activity runs. It's adequate for debugging individual failures. For production monitoring -- failure alerting, SLA tracking, trend analysis -- you need Azure Monitor integration. This isn't hard to set up, but it's an external dependency that new teams consistently miss. You should configure it before you go live, not after your first production failure.

The ARM Template Size Problem

For ADF factories with 100+ resources, the ARM template generated by the Publish operation is unwieldy. Large ARM templates hit deployment limits, are difficult to diff in code review, and slow down the ADO pipeline. The workaround is partial ARM template deployment, but this requires tooling that isn't first-party.

What's Still Genuinely Broken

The Manual Publish Step

This is the one that still makes me genuinely frustrated. When a developer merges a PR to the collaboration branch in ADF git integration, nothing happens to adf_publish. The ARM templates are not regenerated. CI/CD does not trigger. Someone has to go to ADF Studio, verify they're on the collaboration branch, and click Publish.

There is no native way to automate this. The community workaround (the npm publish utility in microsoft/azure-data-factory-utilities) works but requires ADO pipeline configuration that's external to ADF and dependent on a community-maintained package. Three years after git integration shipped, the Publish step is still manual. This is a first-party gap.

No First-Party Metadata Framework

The metadata-driven pattern -- Lookup to ForEach to parameterized pipeline -- is the de facto standard for production ADF. Microsoft has never shipped a first-party template, quickstart, or reference implementation for it. The documentation acknowledges that parameterized pipelines exist; it doesn't show you how to build a complete metadata-driven framework. The community did that work. Microsoft accepted the outcome without doing the work themselves.

The Bottom Line

ADF in 2021 is a production-grade enterprise data integration platform. The infrastructure story is proven. The connector breadth is comprehensive. The architectural patterns are understood. The rough edges are real and I've documented them, but they're manageable with good design decisions.

The platform deserves the production workloads it's running. The manual Publish step and the absence of first-party metadata framework documentation are genuine gaps that Microsoft should have closed by now. But they're not blockers.

If you're evaluating ADF for a new project or trying to mature an existing deployment, I'm here to help.

Read more