Real-Time Risk Scoring: Wiring Multiple ML Models Behind One REST Endpoint

The field engineers had a problem: they were making risk decisions with data from three different ML models, each running independently, each returning a different score, and no single system aggregating those scores into a decision. One model scored credit risk. Another scored operational risk. A third — newer, still being validated — scored fraud likelihood. A senior engineer was manually correlating results in a spreadsheet once a day.

The ask was a REST endpoint that field engineers could call from their mobile application and get back a unified risk assessment in real time. The answer required wiring those three models into a common API layer without rewriting or retraining any of them.

The Architecture

Each ML model was already deployed as a separate MLflow model endpoint in Databricks — one per model, one URL per model, all authenticated with the same workspace token. The new layer was a lightweight FastAPI service that called all three endpoints in parallel, aggregated the results, and returned a single response.

import asyncio
import httpx
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional

app = FastAPI()

MODEL_ENDPOINTS = {
    'credit_risk': 'https://your-workspace.azuredatabricks.net/serving-endpoints/credit-risk-v2/invocations',
    'operational_risk': 'https://your-workspace.azuredatabricks.net/serving-endpoints/operational-risk-v1/invocations',
    'fraud_likelihood': 'https://your-workspace.azuredatabricks.net/serving-endpoints/fraud-score-v3/invocations',
}
DATABRICKS_TOKEN = "dapi..."  # Read from env/secrets in production

class RiskRequest(BaseModel):
    entity_id: str
    transaction_amount: float
    region_code: str
    account_age_days: int
    prior_disputes: Optional[int] = 0

class RiskResponse(BaseModel):
    entity_id: str
    credit_risk_score: float
    operational_risk_score: float
    fraud_likelihood_score: float
    composite_risk_level: str  # LOW | MEDIUM | HIGH | CRITICAL

Parallel Model Calls

async def call_model(
    client: httpx.AsyncClient,
    model_name: str,
    url: str,
    payload: dict
) -> tuple[str, float]:
    resp = await client.post(
        url,
        headers={"Authorization": f"Bearer {DATABRICKS_TOKEN}"},
        json={"inputs": [payload]},
        timeout=10.0
    )
    resp.raise_for_status()
    result = resp.json()
    score = result['predictions'][0]
    return model_name, float(score)

@app.post("/risk/assess", response_model=RiskResponse)
async def assess_risk(request: RiskRequest) -> RiskResponse:
    payload = request.dict()

    async with httpx.AsyncClient() as client:
        tasks = [
            call_model(client, name, url, payload)
            for name, url in MODEL_ENDPOINTS.items()
        ]
        results = await asyncio.gather(*tasks, return_exceptions=True)

    scores = {}
    for result in results:
        if isinstance(result, Exception):
            raise HTTPException(status_code=502, detail=f"Model call failed: {result}")
        model_name, score = result
        scores[model_name] = score

    composite = compute_composite_risk(
        scores['credit_risk'],
        scores['operational_risk'],
        scores['fraud_likelihood']
    )

    return RiskResponse(
        entity_id=request.entity_id,
        credit_risk_score=scores['credit_risk'],
        operational_risk_score=scores['operational_risk'],
        fraud_likelihood_score=scores['fraud_likelihood'],
        composite_risk_level=composite
    )

The Composite Risk Logic

def compute_composite_risk(credit: float, operational: float, fraud: float) -> str:
    # Fraud is highest-weight signal — any high fraud score escalates immediately
    if fraud > 0.85:
        return 'CRITICAL'
    weighted = (credit * 0.4) + (operational * 0.3) + (fraud * 0.3)
    if weighted > 0.75:
        return 'HIGH'
    if weighted > 0.50:
        return 'MEDIUM'
    return 'LOW'

What This Isn't

This pattern is an aggregation layer, not a new model. The individual models still own their own predictions — if the fraud model is wrong, this endpoint is also wrong. The composite risk function is business logic, and it should be reviewed by someone who understands the domain risk tolerance, not just by the data engineer who implemented it.

The field engineers got their sub-second unified risk response. The fraud model got flagged for recalibration two weeks later when its scores started drifting. That's a different problem — but at least now there was a single place to look when the scores were wrong. As always, I'm here to help.

Read more