GitHub Copilot in Preview: What AI Code Completion Means for Data Pipeline Work

GitHub Copilot landed in technical preview a couple of months ago. I've had six weeks with it now, and I want to give an honest read rather than a reaction — because the first week reactions I've seen range from "this replaces junior developers" to "completely useless," and neither is accurate.

The truth is more specific than either take, and the specificity is what actually matters for deciding whether to use it.

What Changed in My Daily Workflow

Copilot integrates directly into VS Code, which is where I write most of my data pipeline code. Unlike the Playground workflows I described earlier this year, Copilot is ambient — it's not a separate tool you switch to, it's a suggestion layer on top of your existing editor. You write a function signature or a comment describing intent, and Copilot offers a completion in gray text. Tab to accept, Escape to dismiss.

The first thing I noticed: I stopped looking up PySpark DataFrame API docs for operations I use infrequently. Window.partitionBy().orderBy() with the right rowsBetween bounds, the exact syntax for F.when().otherwise() with multiple conditions, the right import path for a Spark ML transformer I haven't touched in three months — Copilot handles these without the context switch to a browser tab. That's a genuine workflow improvement even if nothing else it does is useful.

Where the Productivity Story Is Real

Copilot is most useful on code that is structurally well-defined and pattern-consistent. Writing the sixth Airflow sensor in a DAG that already has five? Copilot completes it almost entirely from the pattern. Defining a Parquet output stage that follows the same schema as four other output stages in the same file? It gets it right without being asked.

Schema definitions are the clearest win. If you write:

from pyspark.sql.types import StructType, StructField, StringType, LongType, TimestampType

session_schema = StructType([
    StructField("user_id", StringType(), nullable=False),
    StructField("session_id", StringType(), nullable=False),
    # Copilot fills in the rest from naming conventions in the file

Copilot completes the remaining fields based on what it infers from context — field names it's seen used elsewhere in the file, types that match the naming patterns. Not always right. Right often enough that the workflow is: accept, scan, correct the one wrong field rather than type all seven from scratch.

Where It Fails, and Why That Failure Mode Matters

Business logic is where Copilot is actively dangerous. The model generates plausible-looking code for novel business logic — session reconstruction, funnel calculations, attribution models — and gets it wrong in ways that pass visual inspection. The code is syntactically correct. The algorithm is subtly broken.

I've started treating Copilot suggestions for business logic the same way I'd treat code from an intern who has read a lot of documentation but hasn't solved this specific problem before: look at it, understand it, run the tests. Never accept it on pattern recognition alone.

The discipline this requires is knowing which category your current code falls into: pattern (accept with a scan) or logic (review carefully). That distinction is a skill Copilot doesn't surface for you. You have to bring it.

The Right Mental Model

"AI pair programmer" is the wrong frame. A pair programmer understands your intent and pushes back when your approach is wrong. Copilot doesn't know your intent — it knows your context. It's a very fast boilerplate typist who has read your entire codebase and a large fraction of GitHub. Calibrate your expectations accordingly and it's a genuine productivity tool. Expect it to reason about your code and you'll accept suggestions you shouldn't.

I'll keep running it. Six weeks in, the doc-lookup reduction and the schema completion speed are enough to earn the subscription cost. If the reasoning capability improves, the ceiling gets interesting. Right now the ceiling is pattern completion, and pattern completion is useful.

If you're in the preview and have found Copilot useful in data engineering work beyond what I've described, I'd like to compare notes. As always, I'm here to help.

Read more