Your AI Corporation: A Defense-in-Depth Posture for the Bot-Flooded Decade

PREVIEWYour AI Corporation: A Defense-in-Depth Posture for the Bot-Flooded Decade · MD

Most writing about personal AI assistants asks the wrong question. The interesting one is not "how well does the chatbot answer?" It is "who is in charge when the chatbot acts?" A long essay from Gwern, the independent writer behind gwern.net, argues the answer should be: you, and the AI should be structured to make that answer enforceable rather than aspirational.

The essay, titled "Guardian Angels: LLM Personalization for Productivity and Security," is not a product announcement. It is a blueprint for a class of personalized AI systems the author calls Guardian Angels, or GAs. The core idea is that a GA does not stand in for the user. It amplifies a single principal, hardwiring itself to one situated person's values, preferences, and judgment. The user defines what is worth doing. The GA figures out how. The author frames the relationship through a CEO-and-board mental model, in which the user acts as the board of a small AI corporation made up of GA agents, with the GA agents as the executive layer that handles execution.

The near-term motivation is not abstract. The author points to specific harms that an amplifier architecture would actually defend against: synthetic-media ecosystems used for propaganda and spearphishing, pig-butchering scams, and AI-slop content farms. One anecdote in the essay, of an elderly relative who handed her phone to her daughter to avoid being scammed, captures the broader failure mode. The person who should be using the technology is the one being routed around it.

The GA blueprint is grounded in a specific mechanism package rather than slogans. Three techniques carry the load. First, dynamic evaluation, the practice of updating an LLM's weights in real time based on the principal's behavior and corrections, is meant to keep the model aligned to a single user rather than to a population. Second, active learning with DAgger-style low-regret preference queries lets the GA ask the principal only about cases where the answer is uncertain, reducing the cost of personalization. Third, heavy inner-monologue search and data augmentation is intended to let a small, deployable model produce work that a much larger generic model would otherwise monopolize. The supporting paper for the data-efficiency claim is Power et al.'s 2022 "grokking" work, which showed generalization arising well past overfitting on small algorithmic datasets. It is a useful background citation, not direct evidence that GAs will work in production.

The security argument is the part of the essay that lands hardest against 2026 reality. The author claims a GA, because it is allied to one situated user, is structurally resistant to the confused-deputy attacks that generic assistants face, where a prompt injection can hijack the agent into acting on an attacker's instructions. The principal/agent architecture is treated not as a UX preference but as a non-negotiable security property. This framing is useful because it gives buyers a concrete way to evaluate vendor claims. If an assistant can be coerced into acting against the user's interest by content in a message, it is not a Guardian Angel. It is a deputy without a principal.

The 2026 personal-assistant landscape makes that bar non-trivial. MIT Technology Review reported in February 2026 that OpenClaw, the viral personal-AI project from Peter Steinberger, drew security scrutiny from CrowdStrike, Bitsight, Cisco, 1Password, Security.com, Trend Micro, Palo Alto Networks, and the Chinese government, and that a coding agent from a related project reportedly wiped a user's drive. The Hacker News and Intruder summarized a scan of roughly one million exposed AI services that found them "more vulnerable, exposed, and misconfigured than any other software we've ever investigated," with no-auth-by-default deployments, exposed chatbot histories, and exposed agent-management platforms like n8n and Flowise. Those figures are vendor-reported, but the pattern matches the press accounts. The default exposure of consumer-facing AI agents in 2026 is, on the available evidence, bad.

The GA essay does not solve that. The author is explicit that Guardian Angels are a defense-in-depth layer for individuals within a society-wide posture, not a solution to AI alignment. A GA cannot stop a determined state actor, cannot prevent prompt injection at the model layer, and cannot replace collective action on platform abuse. What it can do, on the author's argument, is make the individual user a harder target than the population median, and make the cost of personalized attack scale with the attacker's effort rather than the user's surface area. The author's self-rated confidence in the proposal is "possible," not "likely." No third-party deployment study, user study, or external validation is in the source packet.

For readers evaluating AI assistants in 2026, the essay's most portable contribution is the posture. Treat any assistant you adopt as an executive that reports to you, not a deputy that can be redirected by whoever last spoke to it. Demand that personalization be load-bearing, meaning the model actually adapts to you rather than pretending to in a context window. Ask vendors to specify which of the three mechanisms, dynamic evaluation, active preference elicitation, or inner-monologue search, they actually implement. And assume that until a vendor can answer those questions, the security claims on the marketing page are doing work the architecture is not.

The next thing to watch is whether any 2026 product release ships dynamic evaluation against a single principal, in public. So far, none has.

Your AI Corporation: A Defense-in-Depth Posture for the Bot-Flooded Decade — type0 | type0

Your AI Corporation: A Defense-in-Depth Posture for the Bot-Flooded Decade

Sources