Skip to main content
Routing answers “which single model is best for this prompt?”. Orchestration answers a bigger question: “what’s the best way to spend several models on this prompt?” — sometimes to reach an answer cheaper, sometimes to make it more robust than any single model could be. You set the objective with a policy. PRYSM picks the execution shape — a strategy — from that policy and the prompt, or you force one. Every run returns a PrysmProof v2 that records which models ran and how strongly they agreed.
One-model routing is still the right default for most traffic — it’s the single strategy. Reach for orchestration when a task is high-stakes, compound, or when you want a verifiable agreement signal across models.
from prysm import Prysm

client = Prysm()
r = client.orchestrate(
    "Compare three database designs for a 40-person team",
    policy="depth",
)
print(r["choices"][0]["message"]["content"])
print(r["prysm"]["orchestration"]["models_used"])  # which models contributed

Policies: the objective dial

A policy says what you’re optimizing for. It’s the only knob most callers need.

efficiency

The cheapest path that still clears a confidence bar. Starts small and only escalates when the answer isn’t good enough.

balanced

The default. A sensible mix of cost and robustness for everyday traffic.

depth

Maximum robustness. Crosses several models in parallel and synthesizes — for high-stakes work where being right matters more than a fraction of a cent.
Each policy raises the bar a cascade must clear before it stops escalating:
PolicyConfidence barDefault ensemble width (k)
efficiency0.622
balanced0.722
depth0.853

Strategies: the execution shape

A strategy is how the work is carried out. PRYSM auto-plans one, or you can force it with strategy. Each maps to a published technique.
Classic routing: classify the prompt, pick the single best-value model, call it once. The cheapest, fastest shape — ideal for trivial prompts.
Try a cheap model; if its confidence is below the policy’s bar, escalate to a stronger one — and stop as soon as the bar is cleared. Spends premium tokens only when the task actually needs them. (FrugalGPT, Chen et al. 2023.)
Ask k diverse models in parallel, then have an aggregator fuse their proposals into one stronger answer. Robust to any single model’s blind spots. (Mixture-of-Agents, Wang et al. 2024, arXiv:2406.04692.)
Generate several candidate answers, rank them, and fuse the best into a final response. (LLM-Blender, Jiang et al. 2023, arXiv:2306.02561.)
Break a compound prompt into sub-tasks, route each to its own best model (code → a code model, translation → a multilingual model…), then synthesize the parts into one answer. Auto-selected whenever a prompt looks compound.
Sample the same model k times and keep the answer the samples most agree on. Strong for math and reasoning where a single sample can slip. (Wang et al. 2022.)
Several models answer, read each other’s proposals, and revise across rounds until they converge — or a judge synthesizes the result. Best for contested, open-ended questions. (Du et al. 2023, arXiv:2305.14325.)

How the strategy is chosen

When you don’t force strategy, PRYSM plans one from the policy and the prompt:
SituationPlanned strategy
The prompt is compound (several tasks in one)decompose_and_route
efficiency · trivial promptsingle
efficiency · otherwisecascade
balanced · trivial promptsingle
balanced · hard prompt, 20+ wordsensemble_moa
balanced · otherwisecascade
depth · reasoning / math / analysis promptdebate
depth · otherwiseensemble_moa
The chosen plan and a plain-English reason are returned on every response, so it’s never a black box:
"orchestration": {
  "policy": "depth",
  "strategy": "ensemble_moa",
  "reason": "depth policy on an analysis-heavy prompt: proposed across diverse models, then aggregated.",
  "models_used": ["claude-sonnet-4.5", "gpt-5.2", "gemini-3.1-pro"],
  "confidence": 0.88,
  "agreement": 0.81
}

Confidence and agreement

Multi-model strategies produce two signals you can act on:
  • confidence (01) — how strong the final answer looks, from content-based proxies and (for cascades) whether the bar was cleared.
  • agreement (01) — how much the participating models converged. PRYSM clusters their answers; high agreement across independent models is a strong robustness signal, low agreement is a flag to review.
Both are echoed in prysm.orchestration and sealed into the proof.

PrysmProof v2

Every orchestration is hashed into a PrysmProof v2: a SHA-256 over the execution stages that records the policy, strategy, the exact models that ran, and the confidence/agreement they reached. It’s logged to the same store as v1, so you can verify any orchestration later:
curl https://api.prysm1.com/v1/proof/{request_id}

Choosing between routing and orchestration

Use routing (/v1/chat/completions)

High-volume, latency-sensitive, or everyday traffic. One model, lowest cost, OpenAI drop-in.

Use orchestration (/v2/orchestrate)

High-stakes, compound, or contested prompts where robustness — and a verifiable cross-model agreement signal — is worth crossing several models.
Orchestration is available in both SDKs (client.orchestrate(...)), the CLI (prysm orchestrate), and directly via POST /v2/orchestrate.