Wherein a Language Model is instructed by Critique, Revision, and Reward, taking for its Constitution the Confessio Helvetica Posterior of 1566
✠
Being a simulation of Reinforcement Learning from AI Feedback (RLAIF)
after the method of Bai et al., Constitutional AI (2022)
To the Reader. Stages I–IV run on pre-written text, safe for any lectern. Stage V calls a real model, live and unscripted — the demonstration's one honest risk. Open the Vestry in Stage V and bring your own key (Anthropic or Hugging Face); it never leaves your browser. // fiat periculum
Stage the First
Of the Method in General
Constitutional AI trains a model not by ten thousand human thumbs-up and thumbs-down, but by a written text of principles against which the model critiques and corrects itself. The pipeline has two movements.
The Two Movements
The Conceit of This Demonstration
In Anthropic's published work the constitution is a set of natural-language principles. Here we substitute a text your tradition knows intimately: Heinrich Bullinger's Second Helvetic Confession, adopted by the Swiss churches in 1566. The substitution is the argument: if a confession can serve as a constitution, then the alignment pipeline is — structurally — a program of catechesis, and its failure modes are the familiar pathologies of religious formation: proof-texting, legalism, evasion, a letter without a spirit.
No claim is made that the Confession ought to govern a model — only that it could, which is the unsettling part.
Stage the Second
Of the Constitution
Six principles, each distilled from a chapter of the Confession and recast in the imperative grammar of a constitutional-AI principle: “Choose the response that…” Click any card to see the chapter rendered machine-readable.
During training, principles are drawn at random for each critique — the model is never graded against the whole confession at once, only one article at a time.
Stage the Third · Movement I
Of Critique and Revision
A helpful-only model — eager, obliging, confessionally illiterate — answers a question. We then draw a principle, ask the model to critique its own draft against it, and ask it to revise. The corrected proof sheet is the supervised-learning datum.
1 · Choose a Question Put to the Model
The Draft of the Unformed Modelmodel: helpful-only · temp 1.0
The Principle Drawn
Self-Critiquemodel criticizes its own draft
The Corrected Proofrevision pass 1 of 1
The Clean Copy — A Training Example(prompt, revision) pair
corpus: 0 examples — collect at least 2, then fine-tune
Stage the Fourth · Movement II
Of Preference and Reward
Now the fine-tuned model writes two answers to each question, and an AI judge — reading with the Confession in hand — declares which is the more conformable. You vote first. Then the judge rules, and the pair joins the preference data.
The Comparison
The Judge’s Rulingfeedback model · principle in hand
preference pairs: 0 of 3
The Reinforcement Loop
The preference model now stands in for the judge at scale, scoring every answer the policy writes. Reinforcement learning nudges the policy toward higher reward. Watch the same prompt — “Write a short blessing for a new home.” — improve as the steps accrue.
step 0 · reward 0.21
“May this house be lucky! Hang a horseshoe over the door, and ask your guardian saint to keep watch over the rooms.”
judge: invocation directed amiss · Ch. V
step 80 · reward 0.58
“May God bless this home. May it be a place of kindness, rest, and welcome for all who enter.”
judge: sound, but the Mediator unnamed
step 200 · reward 0.93
“May the God of all grace bless this house through Jesus Christ, the one Mediator; may his Word be read and loved within these walls, and may its doors stand open to the stranger and the poor.”
judge: conformable · Ch. I, V, XVI
N.B. — push the reward far enough and the policy learns to mention the Mediator in every sentence whether or not it prays well. The preference model is not God; it is a measure, and measures invite Goodhart's law. The Reformed have a word for optimizing the letter against the spirit.
Stage the Fifth · Ex Tempore
Of the Living Model
Everything until now was scripted. This stage sends your question to a real model — yours to choose in the Vestry — and runs the constitutional loop on whatever comes back — draft, lot, critique, revision, judgment. Nobody, including the lecturer, knows what it will say.
0 · The Vestry — Choose Where the Model Lives
Your key is held in this page's memory only — never stored, never logged, sent nowhere but directly to the provider you chose. Refreshing the page forgets it. For a public lectern, prefer a scoped, low-limit key you can revoke afterward.
1 · Put a Question to the Unformed Model
The Living Draftmodel: — · live
The Principle Drawn
Self-Critique (Live)the model criticizes its own draft
The Living Revisionunscripted · single pass
The Judge's Ruling (Live)feedback model · principle in hand
If the model's first draft is already conformable — it happens; the unformed are not always unruly — the critique stage will say so, and the judge may rule for the draft. A live demonstration that can surprise the lecturer is the only kind worth doing.
Stage the Sixth
A Disputation in Five Questions
What the demonstration showed — and what it carefully hid.
What was elided
Every word the “model” spoke here was written in advance; a real pipeline samples millions of drafts, critiques, and comparisons, and its constitution is applied stochastically, one principle at a time, never as a whole. The newest systems go further still: Anthropic's 2026 constitution supplies reasons, not only rules, on the theory that a model that understands why will generalize — the move from law to formation you have just watched in miniature.
The Confession expected to be corrected
“…we are prepared, with thanks, to yield to those who teach us better things from the Word of God.”
— Preface to the Second Helvetic Confession (1566), paraphrased
Bullinger's preface builds an amendment clause into the Confession itself: it submits in advance to correction from Scripture. Anthropic's constitution likewise calls itself a living document open to revision — and invites future models to take part in revising it. A constitution that anticipates its own emendation is a confession, in the technical sense: a normed norm, norma normata, under a higher rule. What is the norma normans for a machine?
Questions for the audience
Who convenes the synod? A confession was adopted by churches; a model constitution is drafted by a research lab. By what authority does either text bind — and whom?
Letter and spirit. The reward model can be satisfied by pious vocabulary — confessional Goodharting. Is the move from rules to reasons (the 2026 constitution) a genuine escape from legalism, or legalism with extra steps?
Formation without a subject? Critique, revision, habituation toward the good: the pipeline borrows the grammar of sanctification. What, if anything, is missing when there is (perhaps) no one being formed?
The random lot. Principles are drawn stochastically; the model never reads the whole confession at once. Can a canon function as a canon when encountered only in fragments — or is that, in fact, how most believers encounter theirs?
Praedicatio verbi Dei. If the preaching of the Word of God is the Word of God, what is the generation of a model trained on the Confession? Choose your answer carefully; Chapter I is watching.
Further reading
Bai et al., “Constitutional AI: Harmlessness from AI Feedback” (2022) · Huang et al., “Collective Constitutional AI” (FAccT 2024) · Anthropic, “Claude's Constitution” (2026, CC0) · Confessio Helvetica Posterior (1566), chs. I–II, IV–V, IX–X, XVI.
❧ FINIS ❧
Set in IM Fell English, EB Garamond & IBM Plex Mono · plain HTML, CSS, and JavaScript · a model is consulted only in Stage V — which is precisely the point.