Anthropic's AI Constitution Shows Early Promise, Researchers Say
# Anthropic's AI 'Constitution' Shows Early Promise, Researchers Say Anthropic's effort to write a 30,000-word "constitution" for its AI appears to be working—at least so far. The document constricts Claude's behavior, including by adding limits on dishonesty and causing harm.

Anthropic's AI 'Constitution' Shows Early Promise, Researchers Say
Anthropic's effort to write a 30,000-word "constitution" for its AI appears to be working—at least so far.
The document constricts Claude's behavior, including by adding limits on dishonesty and causing harm. According to Semafor, researchers broke the document down into 205 rules and found that newer, constitution-trained models were much less likely to break them than older ones.
The results aren't perfect. Claude still occasionally uses fabricated data, and its rules sometimes conflict—an AI instructed to lie must either break rules on dishonesty or following operator instructions. But AI-risk experts argue that human values are complex and that smart AIs would find dangerous loopholes, so successfully instilling the constitution would be "a big deal for safety," the researchers said.
The researchers were "fairly impressed" with the results.
Sources
- semafor.com— Semafor
