Policies: the objective dial
A policy says what you’re optimizing for. It’s the only knob most callers need.efficiency
The cheapest path that still clears a confidence bar. Starts small and only escalates
when the answer isn’t good enough.
balanced
The default. A sensible mix of cost and robustness for everyday traffic.
depth
Maximum robustness. Crosses several models in parallel and synthesizes — for
high-stakes work where being right matters more than a fraction of a cent.
| Policy | Confidence bar | Default ensemble width (k) |
|---|---|---|
efficiency | 0.62 | 2 |
balanced | 0.72 | 2 |
depth | 0.85 | 3 |
Strategies: the execution shape
A strategy is how the work is carried out. PRYSM auto-plans one, or you can force it withstrategy. Each maps to a published technique.
single — one best model
single — one best model
Classic routing: classify the prompt, pick the single best-value
model, call it once. The cheapest, fastest shape — ideal for trivial prompts.
cascade — cheap first, escalate if needed
cascade — cheap first, escalate if needed
Try a cheap model; if its confidence is below the policy’s bar, escalate to a stronger
one — and stop as soon as the bar is cleared. Spends premium tokens only when the task
actually needs them. (FrugalGPT, Chen et al. 2023.)
ensemble_moa — propose across models, then aggregate
ensemble_moa — propose across models, then aggregate
Ask
k diverse models in parallel, then have an aggregator fuse their proposals into
one stronger answer. Robust to any single model’s blind spots.
(Mixture-of-Agents, Wang et al. 2024, arXiv:2406.04692.)rank_fuse — generate candidates, rank, fuse
rank_fuse — generate candidates, rank, fuse
Generate several candidate answers, rank them, and fuse the best into a final response.
(LLM-Blender, Jiang et al. 2023, arXiv:2306.02561.)
decompose_and_route — split, route each part, synthesize
decompose_and_route — split, route each part, synthesize
Break a compound prompt into sub-tasks, route each to its own best model (code →
a code model, translation → a multilingual model…), then synthesize the parts into one
answer. Auto-selected whenever a prompt looks compound.
self_consistency — sample, then take the consensus
self_consistency — sample, then take the consensus
Sample the same model
k times and keep the answer the samples most agree on. Strong
for math and reasoning where a single sample can slip. (Wang et al. 2022.)debate — models critique, then converge
debate — models critique, then converge
Several models answer, read each other’s proposals, and revise across rounds until they
converge — or a judge synthesizes the result. Best for contested, open-ended questions.
(Du et al. 2023, arXiv:2305.14325.)
How the strategy is chosen
When you don’t forcestrategy, PRYSM plans one from the policy and the prompt:
| Situation | Planned strategy |
|---|---|
| The prompt is compound (several tasks in one) | decompose_and_route |
efficiency · trivial prompt | single |
efficiency · otherwise | cascade |
balanced · trivial prompt | single |
balanced · hard prompt, 20+ words | ensemble_moa |
balanced · otherwise | cascade |
depth · reasoning / math / analysis prompt | debate |
depth · otherwise | ensemble_moa |
reason are returned on every response, so it’s never
a black box:
Confidence and agreement
Multi-model strategies produce two signals you can act on:- confidence (
0–1) — how strong the final answer looks, from content-based proxies and (for cascades) whether the bar was cleared. - agreement (
0–1) — how much the participating models converged. PRYSM clusters their answers; high agreement across independent models is a strong robustness signal, low agreement is a flag to review.
prysm.orchestration and sealed into the proof.
PrysmProof v2
Every orchestration is hashed into a PrysmProof v2: a SHA-256 over the execution stages that records the policy, strategy, the exact models that ran, and the confidence/agreement they reached. It’s logged to the same store as v1, so you can verify any orchestration later:Choosing between routing and orchestration
Use routing (/v1/chat/completions)
High-volume, latency-sensitive, or everyday traffic. One model, lowest cost, OpenAI
drop-in.
Use orchestration (/v2/orchestrate)
High-stakes, compound, or contested prompts where robustness — and a verifiable
cross-model agreement signal — is worth crossing several models.
Orchestration is available in both SDKs (
client.orchestrate(...)), the
CLI (prysm orchestrate), and directly via
POST /v2/orchestrate.