Six Months With Copilot: The Honest Assessment

Six months is enough time to separate the tool from the hype. I had shipped real work with Copilot in the loop, built and partially rebuilt a context system, and formed opinions about where AI assistance genuinely helps a data engineering workflow and where it introduces overhead that outweighs the benefit.

Here is the honest accounting.

What Worked, and Why

Copilot's value was concentrated in a specific zone: code that follows patterns well-established enough to appear in its training data, in the scope visible within the open editor. Airflow DAG scaffolding, SQLAlchemy connection setup, Pytest fixture boilerplate, type-annotated dataclass definitions. In this zone, Copilot consistently saved time — not by generating perfect code, but by generating a close-enough starting point that editing was faster than writing from scratch.

The inline-review habit — reading every suggestion before accepting — had become automatic by month three. The early phase of accepting suggestions reflexively was over. The tool was genuinely useful in its zone, and I had calibrated what that zone was.

ChatGPT as a reasoning partner had also settled into a clear role: difficult domain problems, architectural tradeoffs, code review for logic errors I was too close to see. The stateless nature remained frustrating, but the markdown context file reduced the re-briefing overhead enough to make it workable.

What Didn't Work, and Why

The context management burden was higher than I had expected. Maintaining a useful CONTEXT.md — current, accurate, comprehensive enough to matter — required deliberate effort every week. On weeks where delivery pressure crowded out maintenance, the file drifted. A drifted context file produces subtly wrong AI assistance, which is worse than no assistance because it's confidently wrong.

The stateless problem had not been solved by the context file. It had been reduced. Every ChatGPT session still required re-establishing context from scratch. The quality of a session was still directly proportional to the quality of context I happened to include at the start. I was still the bottleneck.

Copilot's domain blindness remained. For any task that required project-specific knowledge — the fiscal calendar, the non-standard NULL handling in the source, the agreed transformation sequencing — Copilot was at best neutral and at worst misleading. It would generate code that looked correct and violated a domain constraint I hadn't thought to put in a comment directly above the function.

The Metric That Surprised Me

I had expected the primary win to be speed — more code output per hour. What actually happened was different: the energy cost per unit of output dropped. I was less mentally fatigued at the end of a coding day than I had been six months earlier, even on high-output days.

The hypothesis: cognitive load from low-level pattern recall was being offloaded to Copilot, freeing working memory for the parts of the problem that actually required judgment. This is consistent with how other automation tools work — not faster, less draining.

The implication is that the right metric for AI-assisted development isn't lines of code per hour. It's sustained output quality over a longer time window. I was producing better work later in the day than I had been before, because the tool was handling the parts of the work that don't require me specifically.

The Knowledge System: Work in Progress

The PostgreSQL-backed knowledge layer was running but not yet integrated into my daily workflow in a meaningful way. I could query it manually. I hadn't yet built the automatic injection step — the layer that would surface relevant context at session start without me deciding what to include.

That gap meant I was still doing manual context assembly for most sessions. The system existed as a better storage layer for knowledge I was accumulating. It wasn't yet delivering on the original promise of automatic, relevance-ranked retrieval.

Month seven would be about closing that gap. Not completely — the full vision was still a few months out — but enough to validate whether the retrieval approach actually improved session quality when it ran automatically rather than on demand.

The Question I Kept Coming Back To

Six months in, the question I couldn't stop thinking about wasn't "how do I use these tools better?" It was "why does using these tools require this much active management?" A truly useful AI assistant shouldn't require a weekly maintenance session to keep its context accurate. It shouldn't require me to decide which facts to include at the start of each conversation. It shouldn't lose everything when the chat window closes.

The tools I was using were genuinely useful. They were not yet genuinely intelligent. The gap between those two things was where I was spending an increasing amount of engineering time. If you've found ways to close that gap in your own workflow, I want to hear about it. As always, I'm here to help.

Read more