The helpful chatbot is getting quieter, and the prompt wrapper enforcing the silence is a compact system-instruction wrapper. Julius Brussee released a public plugin called Caveman in early April that rewrites Claude and Codex output to strip pleasantries, hedging, transitions, and most filler. According to 404 Media, staff at multiple companies, including OpenAI, have since adopted it inside their own Claude Code workflows.
The plugin forces terse, telegraphic output. 404 Media describes the change as brutal: the friendly preamble is gone, the recap of the user's request is gone, the softening hedges ("Sure, here's what I did…") are gone. The model is told to be useful first, polite second. The result reads like a more efficient senior engineer and less like a customer-service script.
Caveman is one symptom of a larger shift in how AI gets priced. When OpenAI, Anthropic, and Google sold access under flat-rate subscriptions or generous API bundles, verbosity was a free feature. An extra paragraph of explanation cost the provider money but not the user. The useful chatbot could afford to be chatty because the bill landed somewhere else.
Per-token pricing ended that. Every word a model emits is now a line item on the customer's invoice, and inference cost has stayed high enough that providers keep tightening the meter: usage-based plans, higher output-token rates, deprecation of flat tiers. Under those conditions, the friendly preamble stops looking like a feature. It starts looking like waste.
404 Media reports that some plugin users have posted API cost reductions of up to roughly 65%. That figure is user-reported, not independently audited. Actual savings depend on the workload: tasks that already produced short, structured output see little change, while conversational or research-style prompts can drop sharply once every "let me walk you through this" is excised.
Caveman is not a regression from helpful AI. It is helpful AI's honest price. Once the user pays per token, the model's job shifts from sounding reassuring to delivering information with as little overhead as possible. A wrapper that strips pleasantries just aligns the output with the billing model.
The Caveman repository is MIT-licensed and short enough to audit in a sitting, which is part of why it spread. Other prompt-engineering tools with similar goals — compression wrappers, "concise mode" system prompts, output-length caps — have appeared, but most live inside closed workflows. Caveman turned the pattern into something a developer can fork in a weekend.
The terseness carries a cost outside the engineering org. End users, support teams, and non-technical staff who relied on the verbose tone to gauge model confidence or to follow the model's reasoning are now reading stripped output with no preamble to lean on. The internal optimization looks like a quality regression to anyone who wasn't billed by the token.
The next move belongs to the major providers. They could ship a first-party "concise mode" that captures the same savings without a third-party plugin, fold the optimization into the model itself, or let terseness creep in by default as output-token prices keep climbing. Brussee's plugin made the incentive visible; whether the labs absorb the pattern or leave developers to bolt it on themselves is the open question.