De-Identifying Prompts: Protecting Business Logic From the Model

Shannon Lowder

15 Apr 2026 — 4 min read

A single brass key on a dark surface — you hold the translation key — Photo: “Old brass key” by Ivan Radic, licensed under CC BY 2.0.

When you send a prompt to a cloud model, you are sending data to a server you don't control, operated by a company whose data handling practices you've agreed to in a terms of service document you may not have read carefully. For personal projects, that's a risk you can evaluate and accept. For client work, the calculus is different — the data in that prompt may not be yours to send.

De-identification is the practice of removing or obscuring identifying information before it reaches a model provider. It's not a perfect solution, and it introduces its own engineering complexity. It is, for certain categories of work, the only approach that keeps client data under appropriate control.

What's Actually in a Prompt

The risk isn't always obvious because the sensitive information isn't always explicit. Consider a prompt that asks a model to review a data transformation function. The function itself might reference a specific table name, a specific client's naming convention, or a column name that — even without other context — reveals what kind of data the system handles. A function called transform_phi_encounters tells a reader with any domain knowledge that you're handling protected health information.

More subtle: the structure of your code reveals your architecture. Naming conventions reveal your domain. Schema patterns reveal your data model. None of this is a trade secret in isolation. In aggregate, and combined with other context, it describes your client's system in more detail than most clients would consent to if the question were asked directly.

The question I started asking: before I paste this into a cloud model, would I be comfortable if this exact text appeared in a training dataset, or in a log file that got subpoenaed in a legal proceeding? If the answer is no, the text needs to be de-identified before it leaves the machine.

The De-Identification Pipeline

De-identification in code prompts is different from de-identification in text documents. The common approach for documents — named entity recognition, regex patterns for known identifier formats — doesn't translate directly to code, where the sensitive identifiers are function names, variable names, and schema definitions rather than person names and account numbers.

The approach I use is substitution-based: replace domain-specific identifiers with generic equivalents before the prompt is sent, and reverse the substitution on the response. A function that handles fiscal period calculations becomes a function that handles period_calculations. A table named client_accounts becomes entity_records. The model sees generic names; the response comes back with generic names; the substitution map translates the response back to the actual names in the codebase.

De-identification pipeline: substitute domain identifiers locally, send generic names to the cloud model, then reverse the map on the response — Substitute domain identifiers locally, let the cloud model see only generic names, then reverse the map on the way back — the business logic never leaves your machine.

class PromptDeidentifier:
    def __init__(self):
        self.substitution_map: dict[str, str] = {}
        self.reverse_map: dict[str, str] = {}
        self._counter = 0

    def substitute(self, identifier: str, category: str = "entity") -> str:
        if identifier not in self.substitution_map:
            self._counter += 1
            generic = f"{category}_{self._counter:04d}"
            self.substitution_map[identifier] = generic
            self.reverse_map[generic] = identifier
        return self.substitution_map[identifier]

    def deidentify(self, text: str, identifiers: list[str]) -> str:
        result = text
        for identifier in sorted(identifiers, key=len, reverse=True):
            generic = self.substitute(identifier)
            result = result.replace(identifier, generic)
        return result

    def reidentify(self, text: str) -> str:
        result = text
        for generic, original in self.reverse_map.items():
            result = result.replace(generic, original)
        return result

The substitution map is scoped to a session — the same identifier gets the same generic replacement consistently within a session, so the model can reason about relationships between entities without the identifiers being meaningful outside their substituted context.

Using Local Models to Break Down Business Logic

There's a more interesting application of this pattern: using a local model as a pre-processor to translate business logic into technical terms before sending the result to a cloud model.

The idea: your business problem contains domain-specific value. "Calculate the yield on a non-standard bond with these terms" reveals that you're building financial software. "Transform this time series with these aggregation rules" could describe almost anything. The business concept contains the intellectual property; the technical pattern does not.

Running a local model to translate the business description into a technical specification — substituting domain concepts with technical abstractions — strips the identifiable business logic before the cloud model sees it. The cloud model gets the technical problem. The local model holds the translation key. The intellectual property never leaves the machine.

This isn't foolproof — clever inference from technical patterns can sometimes reconstruct the business domain. But it significantly reduces the information density of what the cloud model receives, which is the achievable goal. Perfect de-identification doesn't exist. Meaningful de-identification does.

The Operational Discipline

De-identification only works if it's applied consistently. A pipeline that de-identifies most prompts but occasionally sends raw text to a cloud model is not meaningfully more private than a pipeline that sends everything raw. The consistency requirement means the de-identification has to be enforced at the infrastructure level — not left to developer discretion on each call.

In the ForgeAI orchestration layer, de-identification is applied automatically for any task that the routing logic classifies as high data sensitivity. The developer doesn't decide whether to de-identify; the routing configuration decides, and the de-identification runs before the model call. Removing the opt-in decision removes the failure mode of forgetting to opt in. As always, I'm here to help if you want to compare notes on de-identification approaches for your specific situation.

De-Identifying Prompts: Protecting Business Logic From the Model

Shannon Lowder

What's Actually in a Prompt

The De-Identification Pipeline

Using Local Models to Break Down Business Logic

The Operational Discipline

Read more

The Context Problem Neither Agent Mesh Nor OpenSharing Solves

Unity AI Gateway and What a Governed Model Access Layer Actually Buys You

You Don't Need Fable. You Need a Router.

DAIS 2026: Genie One and the Context Problem Databricks Is Solving