After seven years of running ADF in production, my mental model of what ADF is has settled. This clarity resolves most of the architectural debates I see teams have about ADF versus other tools. Let me put it plainly.
ADF is an orchestrator. It coordinates the execution of tasks across heterogeneous compute services. It does not execute those tasks itself.
What ADF Actually Does
When you run a Copy Activity in ADF, ADF isn't running the data movement on its own servers. The Copy Activity runs on an Azure Integration Runtime -- a managed compute pool that ADF provisions and manages. ADF tells the IR what to do; the IR does it.
When you run a Mapping Data Flow, ADF provisions a Spark cluster (the Azure IR with Data Flow enabled), submits your transformation as a Spark job, and monitors it. ADF coordinates the execution. Spark does the work.
When you run a Databricks Notebook Activity, ADF calls the Databricks Jobs API and monitors the job until it completes. ADF didn't run any code. Databricks did.
When you run an Azure Function Activity, ADF makes an HTTP call to your Azure Function endpoint. Azure Functions runs your code.
ADF is the coordinator. Every actual computation happens in a compute service that ADF is calling.
The Activity Palette Is a Map of the Ecosystem
ADF's activity types are a catalog of the Azure data and compute services it orchestrates:
- Copy Activity -- data movement via Azure IR
- Mapping Data Flow Activity -- Spark transformation via Azure IR (Data Flow enabled)
- Databricks Notebook/Jar/Python Activity -- Databricks compute
- Synapse Notebook Activity -- Synapse Spark pools
- Azure Function Activity -- serverless compute
- Azure ML Pipeline Activity -- ML training and inference
- Stored Procedure Activity -- SQL compute in Azure SQL or SQL Server
- Custom Activity -- Azure Batch for arbitrary code
This isn't a feature list. It's a map of what ADF coordinates. The activity palette is wide because the ecosystem is wide, not because ADF is doing the work itself.
How This Framing Resolves "ADF vs. X" Debates
ADF vs. Databricks Workflows
False choice. Use ADF to orchestrate Databricks. Databricks Workflows orchestrate individual Databricks jobs (notebooks, JARs, Python scripts) within the Databricks platform. ADF can call a Databricks job via the Notebook Activity and then do something with the result -- write an audit record, trigger the next pipeline stage, alert on failure. ADF operates at the cross-service orchestration level; Databricks Workflows operate at the within-Databricks job level. They complement each other.
My standard architecture: ADF orchestrates the end-to-end pipeline. ADF calls a Databricks job for the heavy transformation work. ADF handles the surrounding orchestration -- prerequisite checks, error handling, notifications, downstream triggers.
ADF vs. Azure Functions
Also a false choice. ADF calls Azure Functions. The Azure Function Activity in ADF is specifically designed for this: call a function endpoint, wait for the response, continue or branch based on the output. ADF handles the orchestration; Azure Functions runs your code. These aren't competing options -- they're layers.
ADF vs. Logic Apps
Different use cases. Logic Apps is designed for event-driven application integration -- receive a webhook, transform a payload, call an API, send an email. It's built around connectors to SaaS applications and event-driven workflows. ADF is built for data pipeline orchestration -- move large datasets, coordinate batch transformations, manage dependencies between data processing jobs.
If you're debating ADF vs. Logic Apps for a data pipeline, you probably want ADF. If you're debating it for an application integration workflow, you probably want Logic Apps.
The Modern Data Platform Stack
With ADF's role clearly defined, the modern Azure data platform stack makes architectural sense:
ADF orchestrates data movement from 90+ sources into Azure Data Lake Storage (raw layer). It coordinates Data Flow transformations. It triggers Databricks jobs for complex transformation logic. It calls stored procedures for SQL-layer serving layer updates. It runs the ML pipeline when new training data arrives.
Azure Data Lake Storage is the storage layer. All data at rest.
Databricks handles complex transformation, ML feature engineering, and anything that benefits from a full Python and Spark environment.
Azure SQL and Synapse Analytics serve the curated data to downstream consumers.
Power BI presents it.
ADF is the connective tissue in this stack. It's not the star of the show -- Databricks gets more conference talks, Power BI gets more executive attention -- and that's exactly the right role for it. Good orchestration is invisible when it works. ADF works.
Seven Years Later
The platform I started with in 2014 and the platform I'm running in 2021 are the same product in name only. The managed infrastructure, the connector breadth, the parameterization model, the Spark transformation capability, the network security story -- these represent seven years of compounding improvements on a solid architectural foundation.
ADF's role has clarified: it's an orchestrator that coordinates heterogeneous compute. That's the right role. The tool is built for it. The ecosystem is built around it.
If you're architecting a data platform on Azure and want to think through where ADF fits alongside your other compute choices, I'm here to help.