Real-Time Risk Scoring: Wiring Multiple ML Models Behind One REST Endpoint
The field engineers had a problem: they were making risk decisions with data from three different ML models, each running independently, each returning a different score, and no single system aggregating those scores into a decision. One model scored credit risk. Another scored operational risk. A third — newer, still being validated — scored fraud likelihood. A senior engineer was manually correlating results in a spreadsheet once a day.
The ask was a REST endpoint that field engineers could call from their mobile application and get back a unified risk assessment in real time. The answer required wiring those three models into a common API layer without rewriting or retraining any of them.
The Architecture
Each ML model was already deployed as a separate MLflow model endpoint in Databricks — one per model, one URL per model, all authenticated with the same workspace token. The new layer was a lightweight FastAPI service that called all three endpoints in parallel, aggregated the results, and returned a single response.
import asyncio
import httpx
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional
app = FastAPI()
MODEL_ENDPOINTS = {
'credit_risk': 'https://your-workspace.azuredatabricks.net/serving-endpoints/credit-risk-v2/invocations',
'operational_risk': 'https://your-workspace.azuredatabricks.net/serving-endpoints/operational-risk-v1/invocations',
'fraud_likelihood': 'https://your-workspace.azuredatabricks.net/serving-endpoints/fraud-score-v3/invocations',
}
DATABRICKS_TOKEN = "dapi..." # Read from env/secrets in production
class RiskRequest(BaseModel):
entity_id: str
transaction_amount: float
region_code: str
account_age_days: int
prior_disputes: Optional[int] = 0
class RiskResponse(BaseModel):
entity_id: str
credit_risk_score: float
operational_risk_score: float
fraud_likelihood_score: float
composite_risk_level: str # LOW | MEDIUM | HIGH | CRITICAL
Parallel Model Calls
async def call_model(
client: httpx.AsyncClient,
model_name: str,
url: str,
payload: dict
) -> tuple[str, float]:
resp = await client.post(
url,
headers={"Authorization": f"Bearer {DATABRICKS_TOKEN}"},
json={"inputs": [payload]},
timeout=10.0
)
resp.raise_for_status()
result = resp.json()
score = result['predictions'][0]
return model_name, float(score)
@app.post("/risk/assess", response_model=RiskResponse)
async def assess_risk(request: RiskRequest) -> RiskResponse:
payload = request.dict()
async with httpx.AsyncClient() as client:
tasks = [
call_model(client, name, url, payload)
for name, url in MODEL_ENDPOINTS.items()
]
results = await asyncio.gather(*tasks, return_exceptions=True)
scores = {}
for result in results:
if isinstance(result, Exception):
raise HTTPException(status_code=502, detail=f"Model call failed: {result}")
model_name, score = result
scores[model_name] = score
composite = compute_composite_risk(
scores['credit_risk'],
scores['operational_risk'],
scores['fraud_likelihood']
)
return RiskResponse(
entity_id=request.entity_id,
credit_risk_score=scores['credit_risk'],
operational_risk_score=scores['operational_risk'],
fraud_likelihood_score=scores['fraud_likelihood'],
composite_risk_level=composite
)
The Composite Risk Logic
def compute_composite_risk(credit: float, operational: float, fraud: float) -> str:
# Fraud is highest-weight signal — any high fraud score escalates immediately
if fraud > 0.85:
return 'CRITICAL'
weighted = (credit * 0.4) + (operational * 0.3) + (fraud * 0.3)
if weighted > 0.75:
return 'HIGH'
if weighted > 0.50:
return 'MEDIUM'
return 'LOW'
What This Isn't
This pattern is an aggregation layer, not a new model. The individual models still own their own predictions — if the fraud model is wrong, this endpoint is also wrong. The composite risk function is business logic, and it should be reviewed by someone who understands the domain risk tolerance, not just by the data engineer who implemented it.
The field engineers got their sub-second unified risk response. The fraud model got flagged for recalibration two weeks later when its scores started drifting. That's a different problem — but at least now there was a single place to look when the scores were wrong. As always, I'm here to help.