After a tokenizer change spiked its inference bill, Weave open-sources its model router

After a tokenizer change spiked its inference bill, Weave open-sources its model router — type0 | type0

PREVIEWAfter a tokenizer change spiked its inference bill, Weave open-sources its model router · MD

After Anthropic tweaked Opus 4.7's tokenizer, Weave's inference bill jumped. The company's engineering team writes nearly all of its code with AI, and that tokenizer change quietly repriced every routine turn in their workflow. Their response, now public on GitHub, is Weave Router: a local proxy that decides, per request, whether a coding turn deserves a frontier model or whether a cheaper open-weight model will do the job.

The routing mechanism is not a vibes-based prompt classifier. According to Weave's launch post, the router was trained via reinforcement learning on tens of thousands of internal agent traces, with the reward signal tied to whether the routed model actually succeeded at its sub-task. The cluster scorer is derived from Avengers-Pro, an arXiv preprint from earlier in 2025 that explores multi-model agent collaboration. That provenance matters: routing is being treated as a learned policy problem rather than a hand-tuned rules engine, which is a meaningful design choice when the workload shifts.

The router sits at localhost:8080 and presents itself as an Anthropic Messages, OpenAI Chat Completions, or Gemini-compatible endpoint. The Show HN thread describes drop-in support for Claude Code, Codex, and Cursor, with provider keys held locally and encrypted at rest by default. It supports streaming, tool use, and vision inputs. Downstream, it can target frontier providers (Anthropic, OpenAI, Google) or open-weight models via OpenRouter or any OpenAI-compatible endpoint, including DeepSeek, Kimi, GLM, Qwen, Llama, and Mistral.

On the benchmark question, Weave claims the top spot on the RouterArena Acc-Cost Arena leaderboard at a score of 76.09 as of the Show HN submission. RouterArena is held under a separate GitHub organization, RouteWorks, which gives the leaderboard claim at least some independent surface. The acc-cost framing, however, is a benchmark artifact: it rewards routers that hold accuracy while lowering spend, which is exactly what Weave's design optimizes for. Treat the 76.09 as RouterArena's measurement of Weave's router under RouterArena's protocol, not as a universal ranking of inference efficiency.

A note on customer framing. Weave positions itself as an engineering intelligence platform and lists Robinhood, PostHog, and Reducto among its customers, along with what its blog describes as hundreds of others. Those logos are attached to Weave's broader product, not specifically to the router, and the router repo does not claim production adoption by those names. Readers evaluating the router as enterprise software should distinguish between Weave-the-platform and Weave Router-the-newly-open-sourced-tool.

For telemetry, the router ships OTLP traces out of the box and exposes a local dashboard with export hooks for Honeycomb, Datadog, and Grafana. That instrumentation matters for the cost-control thesis: a router that picks a cheaper model but breaks your agent loop is not a saving, and Weave's design treats observability as load-bearing rather than ornamental.

What to watch next. Three things will determine whether the router's RL-trained scorer generalizes beyond Weave's own traces. First, whether independent operators reproduce the 76.09 on RouterArena's leaderboard when they run their own workloads through the proxy. Second, how the policy behaves the next time a frontier provider changes tokenizers or pricing, which is the failure mode that produced the tool in the first place. Third, whether the open-source community treats Avengers-Pro as the right starting point for the cluster scorer, or whether simpler prompt-classifier approaches match it on real agent traffic. Weave's tokenizer-driven origin story is a useful reminder that in inference economics, the bill can move without the model itself changing.

After a tokenizer change spiked its inference bill, Weave open-sources its model router

Sources