The Context Problem Neither Agent Mesh Nor OpenSharing Solves

I wrote recently about Azure Agent Mesh and OpenSharing — two infrastructure layers that between them cover how enterprises register, discover, share, and execute agents. Between them, they address a lot of the plumbing that has been missing from the enterprise agent stack.

But there's a gap neither of them touches, and it's the one that determines whether your agents actually produce useful results: the quality of the context you give them.

Agent Mesh tells you how to run agents. OpenSharing tells you how to share agent skills across organizations. Neither tells you how to manufacture context that makes those agents smart about your specific problem, in your specific environment, with your specific history. That's not a protocol problem. It's a memory problem.

The Garbage-In Problem for Agents

The fundamental failure mode I see in production agent deployments is not model capability — it's context quality. An agent reasoning about a pipeline failure has access to a generic system prompt, the immediate error message, and maybe a few recent logs if someone wired that up. It doesn't have the history of how this table has behaved over the last six months. It doesn't know that this exact error pattern appeared twice before and both times it was a schema evolution issue upstream. It doesn't have the context of what remediation worked last time.

That missing context isn't secret or hard to find. It exists in your pipeline run logs, your incident records, your agent's own previous interactions. The problem is that it's scattered across storage systems with no retrieval layer that understands what's relevant right now, for this specific task, weighted by how recent and reliable each piece of information is.

The result is an agent that reasons well but decides poorly, because it's reasoning from an impoverished context. The model isn't the bottleneck. The memory system is.

What Context Manufacturing Actually Requires

Retrieval is the obvious first answer, and it's necessary but not sufficient. A vector similarity search over historical data gets you semantically relevant documents. What it doesn't do:

  • Weight by recency: a note from two years ago about how this table schema worked under a different ETL system is technically relevant but practically misleading. Context needs temporal decay.
  • Fuse multiple signals: the best match under vector similarity isn't always the best match under keyword relevance. A hybrid retrieval that combines semantic search, full-text search, and a reranker produces better results than any single method.
  • Shape for the consumer: a pipeline triage agent needs context in a different shape than a stakeholder report agent. Raw retrieved documents aren't the right unit; consumer-shaped views are.
  • Improve over time: if the context you provided led to a bad agent decision, the memory system should learn from that — flagging the divergence, surfacing it for correction, tightening the retrieval on the next call.

This is what I'm building with Cortex Forge, and why I think of it as a context manufacturing system rather than just a vector database.

How Cortex Forge Approaches It

The architecture follows a medallion model with strict tier separation.

Bronze is the immutable, append-only archive — every raw event, run log, conversation turn, and note captured verbatim. Nothing is ever deleted from Bronze without a human-gated operation. It's the eidetic layer: the guarantee that nothing is lost.

Silver is the derived, system-owned knowledge layer. The engine processes Bronze events into structured notes and extracted facts — cleaned, deduplicated, reconciled. Silver is regenerable from Bronze, which means if the derivation logic improves, you can rebuild it without losing the source record. Human edits are captured as overlay patches — external authority over the system's internal notes, with the human winning on short-term disputes while the system's model of revealed behavior accumulates over time.

Gold is where retrieval happens. Consumer-shaped views over Silver: pgvector indexes for semantic search, BM25 indexes for full-text, per-agent memory sets scoped to specific workflows. A retrieval request against Gold runs hybrid search — HNSW vector + BM25 fused with Reciprocal Rank Fusion, passed through a reranker, filtered by temporal relevance with recency decay. The result isn't a list of documents — it's a ranked, weighted, consumer-shaped context optimized for the specific agent and task making the request.

The MCP server is the Gold consumer interface. Any MCP-compatible agent — LangGraph, Claude Code, Copilot, a custom agent behind OmniRoute — hits the same endpoint and gets back context shaped for its declared purpose.

The Reconciler Is What Makes It Self-Improving

The piece that differentiates this from a well-engineered vector store is the reconciler. Cortex Forge tracks the divergence between what the system believes to be true (Silver) and what is revealed through actual behavior (Bronze). When an agent's decision based on the manufactured context led to a correct outcome, that's a signal. When it didn't, that's a signal too.

The reconciler surfaces flagged divergences — "you stated X, but six months of behavior suggests Y" — for human review. The human's verdict is itself a Bronze event, feeding back into the accuracy of future Silver derivations. The system gets less wrong over time not through automated self-modification but through a structured human-in-the-loop feedback cycle that the system itself generates.

The governing rule is simple: autonomous action is permitted only for reversible operations. Destructive or irreversible operations — deleting a Bronze record, modifying a human overlay — require human authorization regardless of the system's confidence. Confidence affects ranking and whether to surface a proposal; it never authorizes irreversible action.

Where This Plugs Into Agent Mesh and OpenSharing

The Cortex Forge MCP server is itself an agent skill in the OpenSharing model. A provider that wants to offer enriched context retrieval — temporal-aware, hybrid-search, consumer-shaped — can publish the skill through OpenSharing's standard share/schema/asset hierarchy, with scoped credentials and zero-copy access. Any recipient who has been granted access can wire the MCP endpoint into their own agent stack without copying any underlying data.

For Azure Agent Mesh, the connection is even more direct. Register the Cortex Forge MCP server in Azure API Center alongside your other agent skills and tools. The API Center data plane MCP server makes it discoverable to any agent in the mesh. An agent running on Foundry Hosted Agents hits the Cortex Forge endpoint the same way it hits any other registered tool — through the unified discovery surface.

Both protocols were already designed to accommodate exactly this kind of infrastructure-as-a-skill. The MCP standard is the seam. Cortex Forge sits on the provider side of that seam, manufacturing context. The agent sits on the consumer side, using it.

The Practical Difference

I've been running agents with and without this kind of memory layer on the same tasks. The difference isn't subtle. An agent with access to a well-manufactured context from Cortex Forge makes better triage decisions on pipeline failures because it can reason about historical patterns, not just the immediate error. It catches recurrences that would otherwise look like new incidents. It proposes remediation approaches that worked before, rather than generating something from first principles.

The models are the same in both cases. The routing layer is the same. The only difference is whether the context going into the model call is generic or manufactured. That difference shows up in every decision the agent makes downstream.

Both OpenSharing and Azure Agent Mesh assume you've solved the context problem. Cortex Forge is my answer to that assumption. As always, I'm here to help if you're thinking through the memory layer for your own agent stack.

Read more