Starting From Zero: The Hidden Cost of Stateless LLM Sessions

Every GPT-3 Playground session starts cold. The model has no memory of the schema you described last Tuesday, no recollection of the partition strategy decision you worked through together last month, no awareness that you've already ruled out three approaches and the fourth is what you settled on. You open a new session and you're onboarding a new colleague — one who forgets everything the moment the browser tab closes.

I underestimated how much of the LLM workflow cost lives in this re-onboarding. It compounds fast. By the time you've pasted the source schema, explained the config pattern, described the naming conventions, and re-established the constraints from the last session, you've spent 600 tokens on context management before asking your actual question. And if the question requires a follow-up, you do it again.

The fix I've landed on is what I'm calling a context document — and it's changed how I approach LLM sessions for any project that spans more than one conversation.

What a Context Document Is

A context document is a versioned text file that lives in the project repository alongside the code. It describes, in compact structured prose, everything the model needs to give useful answers without re-derivation. It's not a README — the README is for humans navigating the project. The context document is tuned for LLM consumption: dense, structured, no filler.

A working example for a pipeline project:

## Project: customer-event-pipeline

**Stack:** PySpark 3.x on Databricks, Airflow for orchestration, Parquet on S3.

**Naming conventions:**
- Source tables: raw_{entity} (e.g. raw_orders, raw_events)
- Processed tables: processed_{entity} (e.g. processed_orders)
- Partition column: always event_date (DATE, derived from epoch-ms timestamps)
- Timestamps: always epoch milliseconds in source, always UTC

**Active schema — raw_events:**
user_id STRING NOT NULL, event_ts BIGINT (ms) NOT NULL,
event_type STRING (page_view|click|conversion), page_url STRING,
session_id STRING, is_internal BOOLEAN

**Design decisions already made:**
- Broadcast the user dimension (2GB) — tested, fits in executor memory
- Session gap = 30 minutes of inactivity
- Internal events (is_internal=true) excluded at ingest, not downstream
- Partition by event_date only — hour-level partitioning tested, too many prefixes

**Open questions:**
- Late-arriving events policy (currently silently dropped if > 2 days late)
- Whether to denormalize user tier into the events table or keep it as a join

This is roughly 250 tokens. I paste it at the top of every new session. The model now has the context it needs without me typing it from scratch.

Maintaining the Document

The discipline that makes this work: update the context document when you make a decision. Not after the project is done — immediately, the same way you'd update a comment that documents a non-obvious choice. When the "broadcast the user dimension" decision was made, it went into the context document the same day. When we ruled out hour-level partitioning, that went in too.

The context document ends up as a running decision log that's also a prompt template. Two birds, one file.

What This Makes Possible

With a context document, multi-session LLM collaboration becomes tractable. I can close the browser, come back two weeks later, paste the context, and pick up where I left off without the ten-minute re-onboarding. A colleague working on the same project can do the same — they inherit the context I've been building, not a blank slate.

It also makes the sessions faster. A well-maintained context document means I get to the actual question in the first hundred tokens rather than the fifth hundred. The model's useful output per session increases because less of the session is spent on setup.

The stateless limitation of current LLM tooling is real. This is the lowest-friction mitigation I've found. As always, I'm here to help.

Read more