A spacecraft navigating adversarial orbital encounters doesn't need to retrain its brain to get better at dodging threats. It just needs a better playbook.
GUIDE, a framework from researchers at MIT and Universidad Politécnica de Madrid, uses a frontier language model as a supervisory agent that evolves structured natural-language decision rules across mission episodes — without touching the model's weights. The acting model that flies the spacecraft stays fixed. What's updated is the playbook: a collection of state-conditioned rules injected into the model's context at runtime. The work appears in a preprint posted to arXiv on March 28, 2026, accepted to the AI4Space@CVPR Workshop.
The architecture is a two-model split. A lightweight model handles real-time control — it needs to respond in seconds and can't afford the latency of a frontier reasoning call. A larger frontier model, the Reflector, runs offline after each episode, reads the trajectory data, and proposes updates to the decision playbook using structured operations: ADD a new rule, UPDATE an existing one, or REMOVE a broken one. The system maintains multiple playbook candidates simultaneously and selects among them using the UCB1 bandit algorithm, which balances trying new versions against exploiting the ones that have performed well. There's also an ε-biased sampling scheme: the reflector looks at poorly performing episodes most of the time, but occasionally reflects on successes too — to understand why things worked, not just why they didn't.
What the system learned to do reveals the approach. In early episodes, the spacecraft agent pursued its target aggressively — the same strategy you'd expect from a naive baseline. But the guard (simulating an adversarial interceptor in the Kerbal Space Program Differential Games environment) would close in from behind while the agent was focused forward. The reflector identified this failure mode within a few evolution cycles and proposed a guard-avoidance rule: when the guard drops below roughly 220 meters, stop all forward pursuit and apply lateral or vertical evasive thrust until the distance opens back up. After a few playbook updates, that rule was active and the composite interception score — lower is better, combining how close the spacecraft got to its target and how close the guard got to the spacecraft — dropped by 82 to 99 percent on the hardest scenarios.
On the LG7 scenario (dual guard, blocking strategy), GUIDE's best evolved playbook scored 3.49×10^4 versus the static baseline's 1.95×10^7. That's roughly a 99.8 percent reduction. LG6 showed similar results: 5.07×10^4 versus 2.28×10^7, statistically significant at p<0.001. LG4 was more modest — 7.22×10^4 versus 4.15×10^5, significant at p=0.043. LG5 showed improvement in the point estimates but wasn't statistically significant at the 0.05 level. The headline numbers are real. The variance is real too.
The interpretability angle is what makes this architecturally interesting beyond the benchmark. A neural network's learned policy is a black box — you can probe it, but you can't read it. A natural-language decision rulebook is, by construction, readable. "When the Guard is closing inside ~220m, stop all forward pursuit and instead apply continuous lateral and/or vertical evasive thrust" is a rule a flight software engineer can read, reason about, and override. That's a meaningful difference for spacecraft autonomy, where certification requirements often demand that decisions be explainable after the fact. Whether a natural-language playbook satisfies a regulator is an open question — but it's at least a path toward the conversation.
The researchers — Alejandro Carrasco, Mariko Storey-Matsutani, and Richard Linares at MIT, and Victor Rodriguez-Fernandez at Universidad Politécnica de Madrid — have prior form in this test environment. The same team placed second in the KSPDG Challenge at the 2025 AIAA SciTech Forum, a public competition where participants build autonomous agents for non-cooperative orbital maneuvering in the Kerbal Space Program simulation. The Air Force Artificial Intelligence Accelerator, under a cooperative agreement with the Department of the Air Force, sponsored the work, with additional support from the Spanish Agencia Estatal de Investigación.
GUIDE is closest in spirit to ACE, another self-evolving agent framework. The paper's own framing of the distinction is worth quoting directly: rather than using evolving context solely to guide behavior, the context itself — the playbook — becomes the learned decision representation that shapes the actions of a separate online agent. That's the difference: learned weights in one case, a learned document in the other.
Whether that distinction matters for production spacecraft autonomy is an open question. KSP physics are simulation-grade, not orbital. The scenarios are tractable — the paper acknowledges the constraints explicitly. But the architecture is real, the code is on arXiv, and the question it asks is the right one: if the thing you need to adapt lives in context rather than weights, what does that mean for systems where weights can't change?
The playbook, it turns out, is also easier to audit than a weight matrix. That's not nothing.