The Case for Building Your Own Orchestration Layer

There's a question anyone faces before building custom tooling: why not use what already exists? It's a reasonable question, and the answer shouldn't be "because I prefer to build things." It should be a specific list of requirements that existing solutions don't meet, each one with evidence.

Here is the list that justified building a custom orchestration layer, and the evidence behind each item.

Requirement 1: Multi-Provider Routing

I needed to route tasks to different model providers based on task type and data sensitivity. A reasoning-heavy architecture question with no sensitive content goes to Claude. A code review task on client project code goes to a local model. A simple classification task goes to a fast, cheap model. A task that fails one provider's content filter can be retried on another.

Off-the-shelf solutions at the time either assumed a single provider (most tools) or provided routing as a commercial managed service (which re-introduced the data egress problem). Building the router myself meant implementing it once, getting exactly the routing logic I wanted, and retaining control over which data went where.

Requirement 2: Knowledge System Integration

Every AI task in my workflow potentially benefits from context retrieved from the knowledge system. The retrieval needs to happen at the right point in the task flow — before the model call that needs the context, not at the start of every task regardless of relevance.

Integrating the knowledge system as a first-class tool in the orchestration layer means retrieval is available at any step, invoked when the workflow logic determines it's appropriate. This is different from the proxy approach I had been using — the proxy enriched every request; the tool approach enriches only the requests where the workflow explicitly calls for retrieval. Less noise, better targeting.

Requirement 3: Output Verification

Model output that doesn't match the expected format breaks downstream steps. A task that expects a JSON object with specific keys fails if the model returns prose, a slightly different JSON structure, or a JSON object wrapped in markdown code fences. These failures happen. The orchestration layer needs to catch them, optionally retry with a more explicit prompt, and surface the failure clearly when the retry budget is exhausted.

Output verification is not a feature of any model. It has to live in the layer that receives the output. Building that layer myself meant implementing the verification logic once and applying it consistently across all model calls in the workflow.

Requirement 4: Transparent State

A workflow that runs for several steps, makes several model calls, and produces an output needs to be debuggable when something goes wrong. Which step failed? What was the model's input at that step? What did the model return? What verification check failed?

"The workflow failed" is not useful. "Step 3 (code review), model call 2 (output verification), expected field 'issues' missing from response" is useful. Getting that level of transparency requires owning the execution layer and logging at each step.

What LangGraph Provided

LangGraph provided the graph execution model and state management. The graph expressed the workflow structure — which steps ran in which order, what conditions triggered which branches, where retry loops were allowed. The state managed what was known at each point in the workflow execution.

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator

class ReviewState(TypedDict):
    issue_description: str
    retrieved_context: list[dict]
    implementation_plan: str
    implementation_code: str
    review_findings: list[str]
    review_passed: bool
    retry_count: int

workflow = StateGraph(ReviewState)

workflow.add_node("retrieve_context", retrieve_project_context)
workflow.add_node("plan_implementation", generate_implementation_plan)
workflow.add_node("implement", generate_implementation)
workflow.add_node("review", review_implementation)
workflow.add_node("verify_output", verify_review_output)

workflow.add_edge("retrieve_context", "plan_implementation")
workflow.add_edge("plan_implementation", "implement")
workflow.add_edge("implement", "review")
workflow.add_edge("review", "verify_output")
workflow.add_conditional_edges(
    "verify_output",
    lambda state: "end" if state["review_passed"] else "retry",
    {"end": END, "retry": "implement"}
)

This is readable code that expresses the workflow structure directly. The conditional edge on verify_output implements the retry loop — if the review output doesn't pass verification, the workflow goes back to the implementation step. The retry count in the state prevents infinite loops.

What Building It Revealed

Building the orchestration layer made the problem clearer than any amount of evaluation had. The hard parts weren't the graph structure or the model calls. The hard parts were the boundary conditions: what happens when the knowledge retrieval returns nothing? What happens when the model refuses the task on content grounds? What happens when the retry budget is exhausted and the workflow has produced partial output?

Each of those boundary conditions required a decision. Making those decisions explicitly, in code, was more clarifying than any architecture diagram. The system was the specification.

The first working version of the ForgeAI orchestrator was not sophisticated. It handled the happy path reliably and the common failure modes adequately. That was enough to start using it on real work — which is always the point where you find the issues that matter. As always, I'm here to help if you want to dig into any of the specific implementation decisions.

Read more