The Rubber Duck Gets an Opinion: Using GPT-3 as a Thinking Partner

Rubber duck debugging is one of those techniques that sounds ridiculous until you try it. The premise: explain your problem out loud to an inanimate object. A rubber duck on your desk, a stuffed animal, an empty chair. The act of articulating the problem forces you to structure it — and structured problems are easier to solve than vague ones. Most of the time, you find the bug before the duck has to respond.

The GPT-3 Playground is a rubber duck that responds. And that changes the dynamic in ways that are worth talking about.

The Workflow

I want to be specific about what this looks like, because "use an LLM as a thinking partner" is vague enough to be useless. Here's the actual workflow I've settled into for pipeline architecture decisions.

I open the Playground and write a problem statement — not a request for code, but a description of the situation. The schema I'm working with, the transformation I need, the constraint I'm operating under, and the two or three approaches I'm considering. Something like:

I have a Spark job that needs to join a 500GB event table against a 2GB user dimension table daily. The join key is user_id. The event table is partitioned by event_date and stored in Parquet on S3. The user dimension is updated incrementally — about 50K rows change per day. I'm deciding between: (a) reading the full dimension table on every join, (b) caching the dimension table in Spark's memory tier, or (c) broadcasting it. What are the tradeoffs I should be thinking about?

The model responds with a structured analysis. Sometimes it surfaces a tradeoff I hadn't weighed. More often, it organizes the tradeoffs I'd already identified in a way that makes the decision obvious. The value isn't that the model knows something I don't — it's that the model makes me write a clear problem statement, and clear problem statements tend to contain their own answers.

Where the Duck Earns Its Keep

The rubber duck effect is strongest on architecture decisions that have multiple defensible answers. Not "what's the right way to read Parquet in PySpark" — that has a clear answer. But "should I partition this table by event_date or by user_id given these query patterns" — that has tradeoffs that depend on assumptions, and articulating those assumptions to a model that pushes back on them is genuinely useful.

It's also useful for catching false assumptions. When I describe a problem to GPT-3 and it asks a clarifying question — even a naive one — that question sometimes reveals that I've been operating on an unexamined assumption. The model isn't smart enough to identify the assumption directly; but its response to my description exposes the gap in my reasoning.

Where It Fails as a Thinking Partner

GPT-3 doesn't push back effectively. It responds to what you wrote, not to what you meant. If your problem statement contains a flawed premise, the model often accepts the premise and reasons from it rather than challenging it. A good senior engineer would say "wait, why are you assuming the join has to happen at query time? Could you pre-join at ingest?" GPT-3 accepts the query-time join as given and optimizes within that constraint.

This means the quality of the rubber duck session depends entirely on the quality of your problem statement. The model amplifies clear thinking and accepts unclear thinking without complaint. Know which you're bringing.

The One-Shot Limitation

The Playground is a single context window. There's no conversation history. Each prompt stands alone. This means you can't build on a previous exchange — if the model's response raises a follow-up question, you have to re-state the full context in the next prompt or the model loses the thread.

It makes the rubber duck sessions work best as a single, well-formed problem statement rather than an evolving dialogue. Write the problem carefully, get the analysis, extract what's useful, and close the session. Try to have a back-and-forth and you'll spend most of your energy on context management.

That limitation will matter less when the tools catch up to the model capability. For now, the discipline of writing a complete, careful problem statement is most of the value anyway.

If you've been using GPT-3 as a thinking partner and have developed different patterns for the problem statement structure, I'd like to hear what's working. As always, I'm here to help.

Read more