The loop
Code Mode runs four kinds of stage. Generate and review alternate until the code passes or the loop runs out of budget.plan
Detect the target language (or honor the one you pass) and select the coder and critic
models for the policy.
generate
The coder writes the complete solution as one or more fenced files. In
depth, several
diverse coders write in parallel and the best candidate is kept.review
A separate critic model reads the code and returns an explicit verdict —
PASS/FAIL, a score, and a list of concrete issues. If it passes, the loop stops.passed verdict, and the full
per-stage trace (when include_trace is on) so it’s never a black box.
Policies: the objective dial
A policy says what you’re optimizing for. It chooses how many coders run and how strong the critic is.efficiency
One cheap, capable coder and a lightweight critic. The fewest tokens that still gets a
reviewed, repaired solution.
balanced
The default. A solid mid-tier coder paired with a strong reasoning critic — the right
trade-off for everyday code generation.
depth
Maximum robustness. Several diverse coders (different providers) write in parallel,
the best candidate is kept, and the strongest available critic reviews it.
| Policy | Coders | Critic | Default max_iters |
|---|---|---|---|
efficiency | 1 (cheapest capable) | lightweight | 3 |
balanced | 1 (mid-tier) | strong reasoning model | 3 |
depth | several, diverse providers | strongest available | 3 |
Single-shot mode
Setreview: false to skip the critic loop entirely — one coder, one draft, no repair.
It’s the fastest, cheapest shape, for when you just want a quick generation and will review
it yourself.
Overriding the models
PRYSM picks sensible coder and critic models for the policy, but you can force either:Confidence and the verdict
The critic returns an explicitVERDICT: PASS|FAIL and a SCORE between 0 and 1. PRYSM
blends that score with the running confidence so the final number reflects the whole loop,
not just the last turn. The verdict, score, and the concrete issues the critic raised are
all returned on the response:
PrysmProof v2
Every run is hashed into a PrysmProof v2: a SHA-256 that binds the"prysm-code-v1" tag, the policy, the language, the task itself, and every produced
file (path + content hash) together with the execution stages. Because the task and the
files are part of the hash, the proof attests to this code solving this task — change
either and the hash changes. It’s logged to the same store as v1, so you can verify any run
later:
Safety: PRYSM never runs your code
Generating code is safe; executing untrusted, model-written code on a server is not. So Code Mode’s default critic is a pure LLM reviewer — it reasons about correctness, edge cases, and style from the source alone. Nothing PRYSM ships runs the generated code. The reviewer is a pluggable interface: a deployment that owns a real, isolated sandbox can supply its own runner to execute tests, but that’s an explicit, self-hosted choice — never the hosted default.When to use Code Mode
Use chat / routing
Quick snippets, explanations, or code you’ll review yourself. One model, lowest cost,
OpenAI drop-in.
Use Code Mode (/v2/code)
Self-contained tasks where you want a reviewed, repaired solution — and a verifiable
proof of how it was produced.
Code Mode is available in both SDKs (
client.code(...)), the CLI
(prysm code), and directly via POST /v2/code.