Provider Agnosticism as a Design Principle
Provider agnosticism is one of those design principles that sounds like obvious good practice until you're in the middle of a feature and the easiest path is to use a provider-specific API. That's when the principle either holds or it doesn't.
I've been holding it. Here is the argument for why it's worth the friction.
What Provider Lock-In Actually Costs
Lock-in costs are easy to underestimate because they're paid in the future, not in the present. The OpenAI-specific API call you write today is faster to ship than the provider-agnostic abstraction. The cost shows up six months later when you want to switch providers, when OpenAI has a pricing change, when a specific model gets deprecated, or when the terms of service change in a way that affects your use case.
I've hit all of these. Model deprecation: GPT-3.5 Turbo versions cycling through end-of-life dates, each one requiring code changes if the model identifier was hardcoded. Pricing changes: mid-year price adjustments that changed the cost calculus for which provider made sense for which task type. Terms of service: restrictions on using output for training that affected how I could use generated content in my workflow. None of these would have been zero-cost to handle, but all of them would have been more expensive with tightly coupled provider integration.
The Abstraction Layer
The provider abstraction I built was intentionally minimal. A single interface that any provider adapter had to implement:
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Iterator
@dataclass
class ModelResponse:
content: str
model: str
provider: str
input_tokens: int
output_tokens: int
finish_reason: str
class ModelProvider(ABC):
@abstractmethod
def complete(
self,
messages: list[dict],
model: str,
max_tokens: int = 4096,
temperature: float = 0.0,
**kwargs,
) -> ModelResponse:
pass
@abstractmethod
def stream(
self,
messages: list[dict],
model: str,
max_tokens: int = 4096,
temperature: float = 0.0,
**kwargs,
) -> Iterator[str]:
pass
Every provider integration — Anthropic, OpenAI, Ollama, any future provider — implements this interface. The orchestration layer calls provider.complete() without knowing which provider it's talking to. Switching providers is a configuration change, not a code change.
The Routing Layer
With multiple providers available, routing decisions matter. Different providers have different strengths, different cost profiles, and different data handling policies. The routing layer makes task-appropriate choices:
class ModelRouter:
def route(
self,
task_type: str,
data_sensitivity: str,
latency_priority: str,
) -> tuple[ModelProvider, str]:
if data_sensitivity == "high":
# High-sensitivity data stays local
return self.ollama_provider, "hermes3:8b"
if task_type == "reasoning" and data_sensitivity == "low":
# Best reasoning quality for non-sensitive tasks
return self.anthropic_provider, "claude-sonnet-4-6"
if latency_priority == "fast":
# Fast inference for latency-sensitive steps
return self.openai_provider, "gpt-4o-mini"
# Default: local for everything else
return self.ollama_provider, "hermes3:8b"
This is simplified. The production version has more parameters and more routing logic. But the principle is the same: the routing decision is data-driven and adjustable, not hardcoded to a single provider.
The Ollama Integration
Adding local inference via Ollama was the piece that made the architecture genuinely resilient. A local model running on hardware I control has none of the cloud provider failure modes: no capacity constraints, no pricing changes, no terms of service updates, no data egress. It's also slower for large models and limited by the hardware available — but for tasks where those tradeoffs are acceptable, it's the right default.
The Ollama adapter implements the same interface as the cloud providers. From the orchestration layer's perspective, Ollama is just another provider. The routing layer decides when to use it based on the task's data sensitivity and latency requirements. Adding Ollama to the provider pool cost less than a day of implementation work. The operational benefit — a fallback that's available even when cloud providers have issues — is worth significantly more than that.
The Principle Under Pressure
The moments where provider agnosticism is most valuable are also the moments where it's most tempting to abandon: when a specific provider ships a feature that would solve your immediate problem elegantly, and the provider-agnostic equivalent would require significant additional work.
The decision I've made consistently: use the provider-specific feature in a provider-specific adapter, but keep the adapter behind the interface. The orchestration layer sees the interface. The adapter does whatever the provider requires. When the provider changes the feature or retires it, the change is contained to the adapter.
That discipline has held through several provider capability changes. It's not free — the adapter layer adds code and the interface sometimes constrains what a provider can do — but the cost is predictable and bounded. The cost of tight coupling is neither predictable nor bounded. As always, I'm here to help if you want to compare notes on how you've handled the provider abstraction problem.