How orchestration works

Routing answers “which single model is best for this prompt?”. Orchestration answers a bigger question: “what’s the best way to spend several models on this prompt?” — sometimes to reach an answer cheaper, sometimes to make it more robust than any single model could be. You set the objective with a policy. PRYSM picks the execution shape — a strategy — from that policy and the prompt, or you force one. Every run returns a PrysmProof v2 that records which models ran and how strongly they agreed.

One-model routing is still the right default for most traffic — it’s the single strategy. Reach for orchestration when a task is high-stakes, compound, or when you want a verifiable agreement signal across models.

from prysm import Prysm

client = Prysm()
r = client.orchestrate(
    "Compare three database designs for a 40-person team",
    policy="depth",
)
print(r["choices"][0]["message"]["content"])
print(r["prysm"]["orchestration"]["models_used"])  # which models contributed

Policies: the objective dial

A policy says what you’re optimizing for. It’s the only knob most callers need.

efficiency

The cheapest path that still clears a confidence bar. Starts small and only escalates when the answer isn’t good enough.

balanced

The default. A sensible mix of cost and robustness for everyday traffic.

depth

Maximum robustness. Crosses several models in parallel and synthesizes — for high-stakes work where being right matters more than a fraction of a cent.

Each policy raises the bar a cascade must clear before it stops escalating:

Policy	Confidence bar	Default ensemble width (`k`)
`efficiency`	0.62	2
`balanced`	0.72	2
`depth`	0.85	3

Strategies: the execution shape

A strategy is how the work is carried out. PRYSM auto-plans one, or you can force it with strategy. Each maps to a published technique.

single — one best model

Classic routing: classify the prompt, pick the single best-value model, call it once. The cheapest, fastest shape — ideal for trivial prompts.

cascade — cheap first, escalate if needed

Try a cheap model; if its confidence is below the policy’s bar, escalate to a stronger one — and stop as soon as the bar is cleared. Spends premium tokens only when the task actually needs them. (FrugalGPT, Chen et al. 2023.)

ensemble_moa — propose across models, then aggregate

Ask k diverse models in parallel, then have an aggregator fuse their proposals into one stronger answer. Robust to any single model’s blind spots. (Mixture-of-Agents, Wang et al. 2024, arXiv:2406.04692.)

rank_fuse — generate candidates, rank, fuse

Generate several candidate answers, rank them, and fuse the best into a final response. (LLM-Blender, Jiang et al. 2023, arXiv:2306.02561.)

decompose_and_route — split, route each part, synthesize

Break a compound prompt into sub-tasks, route each to its own best model (code → a code model, translation → a multilingual model…), then synthesize the parts into one answer. Auto-selected whenever a prompt looks compound.

self_consistency — sample, then take the consensus

Sample the same model k times and keep the answer the samples most agree on. Strong for math and reasoning where a single sample can slip. (Wang et al. 2022.)

debate — models critique, then converge

Several models answer, read each other’s proposals, and revise across rounds until they converge — or a judge synthesizes the result. Best for contested, open-ended questions. (Du et al. 2023, arXiv:2305.14325.)

How the strategy is chosen

When you don’t force strategy, PRYSM plans one from the policy and the prompt:

Situation	Planned strategy
The prompt is compound (several tasks in one)	`decompose_and_route`
`efficiency` · trivial prompt	`single`
`efficiency` · otherwise	`cascade`
`balanced` · trivial prompt	`single`
`balanced` · hard prompt, 20+ words	`ensemble_moa`
`balanced` · otherwise	`cascade`
`depth` · reasoning / math / analysis prompt	`debate`
`depth` · otherwise	`ensemble_moa`

The chosen plan and a plain-English reason are returned on every response, so it’s never a black box:

"orchestration": {
  "policy": "depth",
  "strategy": "ensemble_moa",
  "reason": "depth policy on an analysis-heavy prompt: proposed across diverse models, then aggregated.",
  "models_used": ["claude-sonnet-4.5", "gpt-5.2", "gemini-3.1-pro"],
  "confidence": 0.88,
  "agreement": 0.81
}

Confidence and agreement

Multi-model strategies produce two signals you can act on:

confidence (0–1) — how strong the final answer looks, from content-based proxies and (for cascades) whether the bar was cleared.
agreement (0–1) — how much the participating models converged. PRYSM clusters their answers; high agreement across independent models is a strong robustness signal, low agreement is a flag to review.

Both are echoed in prysm.orchestration and sealed into the proof.

PrysmProof v2

Every orchestration is hashed into a PrysmProof v2: a SHA-256 over the execution stages that records the policy, strategy, the exact models that ran, and the confidence/agreement they reached. It’s logged to the same store as v1, so you can verify any orchestration later:

curl https://api.prysm1.com/v1/proof/{request_id}

Choosing between routing and orchestration

Use routing (/v1/chat/completions)

High-volume, latency-sensitive, or everyday traffic. One model, lowest cost, OpenAI drop-in.

Use orchestration (/v2/orchestrate)

High-stakes, compound, or contested prompts where robustness — and a verifiable cross-model agreement signal — is worth crossing several models.

Orchestration is available in both SDKs (client.orchestrate(...)), the CLI (prysm orchestrate), and directly via POST /v2/orchestrate.

Get Started

Core Concepts

SDKs & Tools

Guides

Reference

How orchestration works

Policies: the objective dial

efficiency

balanced

depth

Strategies: the execution shape

How the strategy is chosen

Confidence and agreement

PrysmProof v2

Choosing between routing and orchestration

Use routing (/v1/chat/completions)

Use orchestration (/v2/orchestrate)

​Policies: the objective dial

efficiency

balanced

depth

​Strategies: the execution shape

​How the strategy is chosen

​Confidence and agreement

​PrysmProof v2

​Choosing between routing and orchestration

Use routing (/v1/chat/completions)

Use orchestration (/v2/orchestrate)

Policies: the objective dial

Strategies: the execution shape

How the strategy is chosen

Confidence and agreement

PrysmProof v2

Choosing between routing and orchestration