How routing works

When you send a request with model: "auto", PRYSM inspects the prompt, classifies its intent, picks a routing mode, and selects the model that delivers the best result for the lowest cost. The decision is returned in the response’s prysm.routing block, so it’s never a black box.

Want to see the decision without paying for a completion? Call /route (or client.route(...)) for a dry run that returns the chosen model and an estimated cost.

The three modes

PRYSM routes in one of three modes. It selects a mode automatically from the prompt, and you can force one per request with routing_mode.

Quality

Complex, high-stakes work. Routes to premium/frontier models for maximum accuracy and nuance.

Balanced

The default. The best single model for the task at a sensible price.

Agility

Short, simple prompts. Routes to the fastest, cheapest capable model.

How the mode is chosen

Condition	Mode
An `analysis` or `reasoning` signal and more than 20 words	Quality
A `simple` signal or fewer than 8 words	Agility
Everything else	Balanced

Override the automatic choice per request:

client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Summarize this report"}],
    extra_body={"routing_mode": "quality"},
)

await client.complete("Summarize this report", { mode: "quality" });

curl https://api.prysm1.com/v1/chat/completions \
  -H "Authorization: Bearer $PRYSM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{ "role": "user", "content": "Summarize this report" }],
    "routing_mode": "quality"
  }'

Intent signals

PRYSM detects intent from keywords and language. A prompt can match several signals at once; routing weighs them together with the word count and mode.

Signal	Fires on prompts about…
`code`	functions, debugging, languages, APIs, SQL, regex, deploys
`write`	drafting, essays, blogs, emails, copy, editing, tone
`analysis`	analyze, research, compare, evaluate, strategy, reports
`math`	calculations, equations, statistics, proofs, calculus
`translate`	translation between languages
`realtime`	today, latest, current, news, prices, live data
`simple`	quick lookups, definitions, “what is”, conversions
`multimodal`	images, photos, diagrams, charts, video
`reasoning`	step-by-step logic, philosophy, ethics, debate, proofs

PRYSM also detects Chinese and Japanese text and biases toward models that specialize in those languages.

The signals PRYSM actually detected for a request are echoed back in prysm.routing.signals_detected, so you can always see what drove the decision.

Worked examples

These show the default (no BRAIN.md) behavior:

"capital of France?" → Agility

Three words and a simple signal trigger Agility. PRYSM routes to a fast budget model (e.g. deepseek-v4-flash at $0.14/MTok) — paying frontier prices for a fact lookup would be waste.

"write a Python function to parse CSV" → Balanced

A code signal with a moderate length lands in Balanced. Short code prompts route to deepseek-v4-flash (“95% cheaper than GPT-5.2”); longer or higher-stakes code routes to claude-sonnet-4.5 for instruction-following and quality.

"analyze the trade-offs between microservices and a monolith for a 40-person team, considering …" → Quality

An analysis signal with more than 20 words triggers Quality. PRYSM routes to a premium reasoning model (e.g. gpt-5.2) — accuracy matters more than saving a fraction of a cent here.

Direct model selection

You don’t have to use auto. Pass any catalog model ID to pin a single request to that model — routing is skipped and the mode is reported as direct:

client.chat.completions.create(
    model="claude-sonnet-4.5",
    messages=[{"role": "user", "content": "Draft a launch announcement"}],
)

If the ID isn’t recognized, PRYSM falls back to a safe budget default rather than erroring.

Shaping routing with BRAIN.md

Auto-routing is a strong default, but you’re in control. A BRAIN.md file lets you pin models to specific signals, lock a single model, cap per-request cost, and block models entirely — all version-controlled alongside your code. Guardrails like cost caps and blocks always win over routing preferences; see AgentGuard.

Get Started

Core Concepts

SDKs & Tools

Guides

Reference

How routing works

The three modes

Quality

Balanced

Agility

How the mode is chosen

Intent signals

Worked examples

Direct model selection

Shaping routing with BRAIN.md

​The three modes

Quality

Balanced

Agility

​How the mode is chosen

​Intent signals

​Worked examples

​Direct model selection

​Shaping routing with BRAIN.md

The three modes

How the mode is chosen

Intent signals

Worked examples

Direct model selection

Shaping routing with BRAIN.md