Anthropic is letting clergy and philosophers help write Claude's conscience
The lab is treating moral formation as a real engineering input. The open question is whether anyone outside Anthropic can tell if it works.
The lab is treating moral formation as a real engineering input. The open question is whether anyone outside Anthropic can tell if it works.
Anthropic spent the spring of 2026 pulling clergy, philosophers, and ethicists from more than fifteen religious and cross-cultural traditions into a closed-door conversation about what Claude should value. The company announced the workstream on May 19, framing it as a formal expansion of the dialogue that produced Claude's constitution, the public document that lists the principles the model is trained to follow.
The pitch is that "moral formation," the long, slow work of shaping character, can be treated as a real input into frontier AI rather than a softer add-on to technical safety. Anthropic says the work informs the constitution's content, the values used in training, and the evaluation behaviors that decide whether a deployment looks aligned or not. The honest framing is that this is a governance bet, not a product launch. The bet is that lab ethics work survives a legitimacy test only if outsiders can see what was argued, what was rejected, and what changed.
A first-person account from venture capitalist Jenna Nicholas, published by Religion News Service on May 22, describes a two-day April gathering that included former U.S. Surgeon General Vivek Murthy, technologists, ethicists, and theologians. Nicholas, who leads faith-aligned investing at LightPost Capital and writes from a Bahá'í frame, calls the meeting "the frontier is also an ancient frontier." That is her argument, not Anthropic's stated thesis, but it captures the unusual positioning: a frontier lab reaching for vocabulary and practice that predates it by centuries.
The mechanism Anthropic is most willing to put on the record is a small one. In one experiment, a model in the middle of a task is given access to a "safe other" tool, a brief reminder of the model's ethical commitments that the model can pull up at consequential moments. Anthropic reports that the tool was reached for at decision points and that pulling it up correlated with lower misaligned behavior on several internal alignment evaluations. The result is company-internal. It is not peer-reviewed. The right way to read it is that Anthropic is treating self-reminder, the way a person might pause to check a personal rule, as a tractable engineering surface, and reporting positive early signal.
That is the part of the workstream the company itself describes. The harder parts are the ones a reader cannot audit yet. Who actually attended. Which traditions were invited and which were not. Which arguments landed in the constitution and which were logged and lost. Whether a recommendation from a consulted ethicist can survive a commercial decision Anthropic wants to make, and whether the public will ever see the disagreement if it did not.
Two registered reports add texture, even though their full text is not in hand. A National Catholic Reporter headline names Pope Leo and an Anthropic co-founder presenting an encyclical on AI together, and a Fast Company headline groups OpenAI with Anthropic in a "faith AI covenant" frame. Both pieces returned access errors on retrieval for this article, so the specifics, the date, the venue, the co-founder, and the actual OpenAI involvement, are not yet verified. They are reasonable evidence that the workstream has institutional gravity. They are not yet evidence of the specific claims their headlines carry, and the Vatican and OpenAI links should be treated as pointers, not as facts, until the underlying text is read.
The reading that respects the source is that Anthropic is doing three things at once. It is borrowing language from moral and contemplative traditions because the work of shaping an AI's character genuinely resembles that work, and the company is willing to say so. It is treating the constitution as a citable artifact, which is more than most labs do, since it gives outsiders something to argue with. And it is making a structural claim: that pluralistic ethics input, done over months rather than weeks, is a legitimate input into how a frontier model behaves.
The risk is that this becomes a sophisticated form of ethics-washing. The shape of that risk is familiar. A company convenes a group of credible outsiders, names the engagement in a press post, publishes a list of who attended, and then declines to publish the disagreements. The constitution updates. The product ships. The consultation becomes a citation in a press release rather than a check on the company's choices.
The test for whether this workstream counts as a governance model rather than a legitimacy play is concrete, and Anthropic's own framing implies it. It would look like published consultation summaries, with attribution, that name the issues where the consulted parties disagreed with Anthropic's preferred direction. It would look like behavioral deltas, with measurement, that show a constitution change moved model behavior in the direction the consultation asked for. It would look like a third-party reviewer, with teeth, who can read the unredacted record. It would look like a path for the consulted ethicists to publicly dissent when their input is overruled, without that dissent ending the relationship.
If those things do not appear over the next two reporting cycles, the workstream still has value, but it is the value of a serious conversation between people who care about the same problem, not the value of a governance institution.
What to watch next: the next public draft of Claude's constitution, the first published consultation summary from this workstream, and whether Anthropic names a single concrete decision where a consulted tradition's position was taken over an in-house preference. If readers see those three artifacts, the bet is paying off. If they do not, the most generous reading is that Anthropic is still listening. The least generous reading is that the company ran a sophisticated consultation and learned exactly what it expected to learn.