Provider Agnosticism as a Design Principle

Provider agnosticism is one of those design principles that sounds like obvious good practice until you're in the middle of a feature and the easiest path is to use a provider-specific API. That's when the principle either holds or it doesn't.

I've been holding it. Here is the argument for why it's worth the friction.

What Provider Lock-In Actually Costs

Lock-in costs are easy to underestimate because they're paid in the future, not in the present. The OpenAI-specific API call you write today is faster to ship than the provider-agnostic abstraction. The cost shows up six months later when you want to switch providers, when OpenAI has a pricing change, when a specific model gets deprecated, or when the terms of service change in a way that affects your use case.

I've hit all of these. Model deprecation: GPT-3.5 Turbo versions cycling through end-of-life dates, each one requiring code changes if the model identifier was hardcoded. Pricing changes: mid-year price adjustments that changed the cost calculus for which provider made sense for which task type. Terms of service: restrictions on using output for training that affected how I could use generated content in my workflow. None of these would have been zero-cost to handle, but all of them would have been more expensive with tightly coupled provider integration.

The Abstraction Layer

The provider abstraction I built was intentionally minimal. A single interface that any provider adapter had to implement:

from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Iterator

@dataclass
class ModelResponse:
    content: str
    model: str
    provider: str
    input_tokens: int
    output_tokens: int
    finish_reason: str

class ModelProvider(ABC):
    @abstractmethod
    def complete(
        self,
        messages: list[dict],
        model: str,
        max_tokens: int = 4096,
        temperature: float = 0.0,
        **kwargs,
    ) -> ModelResponse:
        pass

    @abstractmethod
    def stream(
        self,
        messages: list[dict],
        model: str,
        max_tokens: int = 4096,
        temperature: float = 0.0,
        **kwargs,
    ) -> Iterator[str]:
        pass

Every provider integration — Anthropic, OpenAI, Ollama, any future provider — implements this interface. The orchestration layer calls provider.complete() without knowing which provider it's talking to. Switching providers is a configuration change, not a code change.

The Routing Layer

With multiple providers available, routing decisions matter. Different providers have different strengths, different cost profiles, and different data handling policies. The routing layer makes task-appropriate choices:

class ModelRouter:
    def route(
        self,
        task_type: str,
        data_sensitivity: str,
        latency_priority: str,
    ) -> tuple[ModelProvider, str]:
        if data_sensitivity == "high":
            # High-sensitivity data stays local
            return self.ollama_provider, "hermes3:8b"

        if task_type == "reasoning" and data_sensitivity == "low":
            # Best reasoning quality for non-sensitive tasks
            return self.anthropic_provider, "claude-sonnet-4-6"

        if latency_priority == "fast":
            # Fast inference for latency-sensitive steps
            return self.openai_provider, "gpt-4o-mini"

        # Default: local for everything else
        return self.ollama_provider, "hermes3:8b"

This is simplified. The production version has more parameters and more routing logic. But the principle is the same: the routing decision is data-driven and adjustable, not hardcoded to a single provider.

The Ollama Integration

Adding local inference via Ollama was the piece that made the architecture genuinely resilient. A local model running on hardware I control has none of the cloud provider failure modes: no capacity constraints, no pricing changes, no terms of service updates, no data egress. It's also slower for large models and limited by the hardware available — but for tasks where those tradeoffs are acceptable, it's the right default.

The Ollama adapter implements the same interface as the cloud providers. From the orchestration layer's perspective, Ollama is just another provider. The routing layer decides when to use it based on the task's data sensitivity and latency requirements. Adding Ollama to the provider pool cost less than a day of implementation work. The operational benefit — a fallback that's available even when cloud providers have issues — is worth significantly more than that.

The Principle Under Pressure

The moments where provider agnosticism is most valuable are also the moments where it's most tempting to abandon: when a specific provider ships a feature that would solve your immediate problem elegantly, and the provider-agnostic equivalent would require significant additional work.

The decision I've made consistently: use the provider-specific feature in a provider-specific adapter, but keep the adapter behind the interface. The orchestration layer sees the interface. The adapter does whatever the provider requires. When the provider changes the feature or retires it, the change is contained to the adapter.

That discipline has held through several provider capability changes. It's not free — the adapter layer adds code and the interface sometimes constrains what a provider can do — but the cost is predictable and bounded. The cost of tight coupling is neither predictable nor bounded. As always, I'm here to help if you want to compare notes on how you've handled the provider abstraction problem.

Read more