The Client Question That Forces Honesty
A client called me in January to talk about their new data platform. Greenfield. Azure. No legacy debt. The question: should we build on Azure Data Factory or Synapse Pipelines? Both are on the table. Give us a recommendation.
I've been running ADF in production since 2014. Eight years. I know this product. And I still had to think before answering, because the honest answer in 2022 is more complicated than "use ADF" or "use Synapse Pipelines."
So let me do what that conversation forced me to do: clear-eyed accounting of where ADF actually is in 2022.
What ADF Delivers in 2022
The platform works. That's the first thing to say. If you're building on ADF today, you're building on something stable, mature, and genuinely capable of enterprise-scale data integration. Here's what "works" means in concrete terms:
Managed Infrastructure
You provision an ADF instance. You never provision a server. The Azure Integration Runtime handles compute allocation automatically. The Self-Hosted Integration Runtime lets you run pipelines against on-premises sources without a VPN gateway. The Azure-SSIS Integration Runtime runs legacy SSIS packages in managed infrastructure. Zero servers to patch, resize, or babysit.
I remember 2014, when the SSIS equivalent of this was running packages in Azure VMs you managed yourself. The improvement is fundamental, not incremental.
100+ Connectors
The connector library crossed 100 in 2022. Every major Azure service. Every major RDBMS and data warehouse. Salesforce, Dynamics, SAP, ServiceNow, HubSpot, Marketo, Snowflake, Google Analytics, and more SaaS sources than I can enumerate. S3, GCS, ADLS Gen2, Azure Blob. REST connector for everything else, plus ODBC for anything with a driver.
In 2014, the connector list was short enough to memorize. Today, the connector problem is solved. It's not a selling point anymore — it's expected.
Parameterized Pipelines and Metadata-Driven Patterns
ADF v2 introduced full parameterization: pipelines, datasets, linked services, triggers — all parameterized. Combined with ForEach activity, this enables the metadata-driven framework pattern I've been building on for years. One pipeline template handles 200 tables. The metadata drives it. Simple, right? The pattern works. It's just not first-party — you build and maintain the metadata store yourself.
Data Flows
Spark-backed transformation inside ADF Studio. GA since 2019, three years of production hardening behind it. If you need SCD Type 2 logic, aggregation pipelines, complex joins — Data Flows handle it without spinning up a Databricks cluster yourself. Not the right tool for every transformation, but right for many.
Git Integration and CI/CD
ADF has native git integration (Azure DevOps or GitHub). The adf_publish branch / ARM template CI/CD model is workable, especially now that the automated publish step has a proper npm-based solution that removes the manual button-click from the deployment process. It's not elegant, but it's reliable.
Managed Virtual Network
Private endpoints for ADF's managed runtime, eliminating public internet exposure for source and sink connections. Enterprise network security without routing traffic through your own VNet.
Where the Gaps Still Are
Eight years of production means I also know exactly where ADF still falls short.
No First-Party Metadata Framework
The metadata-driven pattern — the one that makes ADF scale beyond toy pipelines — is entirely community-built. You maintain the control table schema, the metadata population logic, the parameter-passing conventions. Microsoft has never shipped first-party documentation or templates for this. It's 2022. This gap has been open for eight years.
The Publish Step
The manual Publish button in the ADF UI — the one that generates the ARM templates in the adf_publish branch — was a design decision that created years of CI/CD pain. The community solution (the microsoft/azure-data-factory-utilities npm package) works, but it shouldn't have taken until 2022 for this to be properly solvable without hacks.
ARM Template Complexity
A large factory with 200 resources generates ARM templates measured in thousands of lines. These templates are slow to diff, hard to review in pull requests, and occasionally hit Azure template size limits. The underlying approach — express all factory objects as ARM — is theoretically sound but practically painful at scale.
The Synapse Pipelines Question
In 2022, Synapse Pipelines has feature parity with ADF on almost everything that matters: same connectors, same activity types, same expression language, same parameterization, same git integration model. Microsoft's differentiation story — "ADF for enterprise integration teams, Synapse for analytics co-location" — is increasingly thin. This ambiguity is a real cost for teams making platform decisions today.
My Recommendation to the Client
I told them: the technical skills and patterns are 95% transferable between ADF and Synapse Pipelines. Pick based on organizational fit — which team is building this, which workspace makes sense, whether you're using Synapse SQL or Spark pools. If you have legacy SSIS packages, ADF is the answer (Azure-SSIS IR). Otherwise, you're choosing between two products that will likely consolidate into something else within the next two years.
Invest in the patterns, not the product names. The metadata-driven framework pattern, the parameterization model, the CI/CD approach — these transfer. The product names may not.
If you're making the same platform choice, I'm happy to walk through it with you. As always, reach out.