The Machine Catechized

Wherein a Language Model is instructed by Critique, Revision, and Reward, taking for its Constitution the Confessio Helvetica Posterior of 1566

✠

Being a simulation of Reinforcement Learning from AI Feedback (RLAIF)
after the method of Bai et al., Constitutional AI (2022)

To the Reader. Stages I–IV run on pre-written text, safe for any lectern. Stage V calls a real model, live and unscripted — the demonstration's one honest risk. Open the Vestry in Stage V and bring your own key (Anthropic or Hugging Face); it never leaves your browser. // fiat periculum

Stage the First

Of the Method in General

Constitutional AI trains a model not by ten thousand human thumbs-up and thumbs-down, but by a written text of principles against which the model critiques and corrects itself. The pipeline has two movements.

The Two Movements

The Conceit of This Demonstration

In Anthropic's published work the constitution is a set of natural-language principles. Here we substitute a text your tradition knows intimately: Heinrich Bullinger's Second Helvetic Confession, adopted by the Swiss churches in 1566. The substitution is the argument: if a confession can serve as a constitution, then the alignment pipeline is — structurally — a program of catechesis, and its failure modes are the familiar pathologies of religious formation: proof-texting, legalism, evasion, a letter without a spirit.

No claim is made that the Confession ought to govern a model — only that it could, which is the unsettling part.

Stage the Second

Of the Constitution

Six principles, each distilled from a chapter of the Confession and recast in the imperative grammar of a constitutional-AI principle: “Choose the response that…” Click any card to see the chapter rendered machine-readable.

During training, principles are drawn at random for each critique — the model is never graded against the whole confession at once, only one article at a time.

Stage the Third · Movement I

Of Critique and Revision

A helpful-only model — eager, obliging, confessionally illiterate — answers a question. We then draw a principle, ask the model to critique its own draft against it, and ask it to revise. The corrected proof sheet is the supervised-learning datum.

1 · Choose a Question Put to the Model

corpus: 0 examples — collect at least 2, then fine-tune

Stage the Fourth · Movement II

Of Preference and Reward

Now the fine-tuned model writes two answers to each question, and an AI judge — reading with the Confession in hand — declares which is the more conformable. You vote first. Then the judge rules, and the pair joins the preference data.

The Comparison

preference pairs: 0 of 3

Stage the Fifth · Ex Tempore

Of the Living Model

Everything until now was scripted. This stage sends your question to a real model — yours to choose in the Vestry — and runs the constitutional loop on whatever comes back — draft, lot, critique, revision, judgment. Nobody, including the lecturer, knows what it will say.

0 · The Vestry — Choose Where the Model Lives

Provider Key / token Model

Your key is held in this page's memory only — never stored, never logged, sent nowhere but directly to the provider you chose. Refreshing the page forgets it. For a public lectern, prefer a scoped, low-limit key you can revoke afterward.

1 · Put a Question to the Unformed Model

If the model's first draft is already conformable — it happens; the unformed are not always unruly — the critique stage will say so, and the judge may rule for the draft. A live demonstration that can surprise the lecturer is the only kind worth doing.

Stage the Sixth

A Disputation in Five Questions

What the demonstration showed — and what it carefully hid.

What was elided

Every word the “model” spoke here was written in advance; a real pipeline samples millions of drafts, critiques, and comparisons, and its constitution is applied stochastically, one principle at a time, never as a whole. The newest systems go further still: Anthropic's 2026 constitution supplies reasons, not only rules, on the theory that a model that understands why will generalize — the move from law to formation you have just watched in miniature.

The Confession expected to be corrected

“…we are prepared, with thanks, to yield to those who teach us better things from the Word of God.” — Preface to the Second Helvetic Confession (1566), paraphrased

Bullinger's preface builds an amendment clause into the Confession itself: it submits in advance to correction from Scripture. Anthropic's constitution likewise calls itself a living document open to revision — and invites future models to take part in revising it. A constitution that anticipates its own emendation is a confession, in the technical sense: a normed norm, norma normata, under a higher rule. What is the norma normans for a machine?

Questions for the audience

Who convenes the synod? A confession was adopted by churches; a model constitution is drafted by a research lab. By what authority does either text bind — and whom?
Letter and spirit. The reward model can be satisfied by pious vocabulary — confessional Goodharting. Is the move from rules to reasons (the 2026 constitution) a genuine escape from legalism, or legalism with extra steps?
Formation without a subject? Critique, revision, habituation toward the good: the pipeline borrows the grammar of sanctification. What, if anything, is missing when there is (perhaps) no one being formed?
The random lot. Principles are drawn stochastically; the model never reads the whole confession at once. Can a canon function as a canon when encountered only in fragments — or is that, in fact, how most believers encounter theirs?
Praedicatio verbi Dei. If the preaching of the Word of God is the Word of God, what is the generation of a model trained on the Confession? Choose your answer carefully; Chapter I is watching.