Skip to main content
When you send a request with model: "auto", PRYSM inspects the prompt, classifies its intent, picks a routing mode, and selects the model that delivers the best result for the lowest cost. The decision is returned in the response’s prysm.routing block, so it’s never a black box.
Want to see the decision without paying for a completion? Call /route (or client.route(...)) for a dry run that returns the chosen model and an estimated cost.

The three modes

PRYSM routes in one of three modes. It selects a mode automatically from the prompt, and you can force one per request with routing_mode.

Quality

Complex, high-stakes work. Routes to premium/frontier models for maximum accuracy and nuance.

Balanced

The default. The best single model for the task at a sensible price.

Agility

Short, simple prompts. Routes to the fastest, cheapest capable model.

How the mode is chosen

ConditionMode
An analysis or reasoning signal and more than 20 wordsQuality
A simple signal or fewer than 8 wordsAgility
Everything elseBalanced
Override the automatic choice per request:
client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Summarize this report"}],
    extra_body={"routing_mode": "quality"},
)

Intent signals

PRYSM detects intent from keywords and language. A prompt can match several signals at once; routing weighs them together with the word count and mode.
SignalFires on prompts about…
codefunctions, debugging, languages, APIs, SQL, regex, deploys
writedrafting, essays, blogs, emails, copy, editing, tone
analysisanalyze, research, compare, evaluate, strategy, reports
mathcalculations, equations, statistics, proofs, calculus
translatetranslation between languages
realtimetoday, latest, current, news, prices, live data
simplequick lookups, definitions, “what is”, conversions
multimodalimages, photos, diagrams, charts, video
reasoningstep-by-step logic, philosophy, ethics, debate, proofs
PRYSM also detects Chinese and Japanese text and biases toward models that specialize in those languages.
The signals PRYSM actually detected for a request are echoed back in prysm.routing.signals_detected, so you can always see what drove the decision.

Worked examples

These show the default (no BRAIN.md) behavior:
Three words and a simple signal trigger Agility. PRYSM routes to a fast budget model (e.g. deepseek-v3.2 at $0.28/MTok) — paying frontier prices for a fact lookup would be waste.
A code signal with a moderate length lands in Balanced. Short code prompts route to deepseek-v3.2 (“95% cheaper than GPT-5.2”); longer or higher-stakes code routes to claude-sonnet-4.5 for instruction-following and quality.
An analysis signal with more than 20 words triggers Quality. PRYSM routes to a premium reasoning model (e.g. gpt-5.2) — accuracy matters more than saving a fraction of a cent here.

Direct model selection

You don’t have to use auto. Pass any catalog model ID to pin a single request to that model — routing is skipped and the mode is reported as direct:
client.chat.completions.create(
    model="claude-sonnet-4.5",
    messages=[{"role": "user", "content": "Draft a launch announcement"}],
)
If the ID isn’t recognized, PRYSM falls back to a safe budget default rather than erroring.

Shaping routing with BRAIN.md

Auto-routing is a strong default, but you’re in control. A BRAIN.md file lets you pin models to specific signals, lock a single model, cap per-request cost, and block models entirely — all version-controlled alongside your code. Guardrails like cost caps and blocks always win over routing preferences; see AgentGuard.