It's Not the Model, It's the Plumbing: How a 74% Boilerplate Cut Exposes the Duplicated HTTP/MCP Tax on AI Agent Stacks

PREVIEWIt's Not the Model, It's the Plumbing: How a 74% Boilerplate Cut Exposes the Duplicated HTTP/MCP Tax on AI Agent Stacks · MD

Your AI agent keeps breaking. The model is not the only suspect, and for a large class of integration failures, it is not even the right one. That is the case a small arXiv preprint from Edwin Jose Chittilappilly makes with a measured, falsifiable mechanism instead of a manifesto: a Python framework called HarnessAPI that unifies a tool's HTTP endpoint and its Model Context Protocol registration into a single typed handler, and a 74% reduction in the framework-facing code required to ship a new agent skill.

The mechanism is the part worth slowing down on, because it explains why agent stacks fail in ways that look like model failures.

Today's agent stack typically expresses the same capability twice. A developer writes a Python function, then writes a second registration to expose it as a Model Context Protocol tool, the open standard that lets language models call external functions. The HTTP endpoint, the schema, the validation, the streaming, the error handling, and the MCP tool definition all live in separate places. The two sides drift as the code evolves: the HTTP path gets a new field, the MCP tool does not, and the model starts hallucinating parameters that no longer exist. The user sees a model that "does not follow instructions." The actual cause is a transport-layer desync.

HarnessAPI's bet is that the duplication itself is the bug. The framework takes a single handler.py plus a Pydantic schema, the Python library for typed data validation, and from that one source of truth it auto-derives three things: a streaming Server-Sent Events HTTP endpoint, an OpenAPI/Swagger UI for human inspection, and a zero-configuration MCP tool. One process. One schema. The same handler serves both streaming and JSON-returning clients through dual-mode content negotiation, so a team does not maintain two adapters for clients with different response preferences.

The 74% figure is concrete and measured. Across six representative skills, the author used cloc, the standard tool for counting lines of source code, to compare the framework-facing boilerplate in a manually maintained FastAPI plus FastMCP dual stack against HarnessAPI's single-handler version. The reduction is the framework overhead, not the handler logic itself, which is the right thing to measure if the hypothesis is that duplicated plumbing is the failure surface.

This is also a developer-side lever, not a model story. Chittilappilly is not arguing that model capability is irrelevant. He is arguing that when an agent misbehaves, the model is one of several suspects, and the tool and transport layer is a more actionable one. As he put it in a LinkedIn post announcing the work, MCP is the new API. The implication is that tool and agent integration is becoming infrastructure-shaped, a layer teams have to design deliberately rather than glue together per app.

The architecture has a second quiet win. HarnessAPI subclasses FastAPI, the popular Python web framework, so it inherits the existing middleware, dependency injection, and deployment ecosystem rather than asking teams to abandon it. Adoption friction is closer to "add a folder" than "port the stack." The code is publicly released on GitHub and packaged on [PyPI as harnessapi](https://pypi.org/project/harnessapi/), which gives the paper a primary-source release artifact rather than a design sketch.

The paper also addresses a real technical snag. FastMCP, the popular MCP server framework, has a known limitation: it inspects closures rather than re-parsing type annotations, so handlers defined as closures lose their Pydantic types. HarnessAPI uses dynamic code generation to propagate the Pydantic types into FastMCP's inspection layer, which is the kind of plumbing detail that explains why naive agent integrations break in production.

The corroborating evidence is not from the author. An independent empirical study of MCP and tool integration documents real plumbing-layer pain in production agent systems. Harness.io's engineering team, a vendor with its own MCP product, published a detailed post on redesigning their MCP server to address the same class of failure. That blog is practitioner perspective rather than neutral third-party market evidence, but it is written by a team shipping MCP servers to paying customers, and the failure mode it describes matches the paper's diagnosis. Independent commentary on the broader pattern has also appeared on AI Sherpa, framing the "skill-first inversion" as a way to make agent stacks inspectable rather than mysterious.

Where to be careful. arXiv:2605.22733 is a preprint, not peer-reviewed. The 74% reduction is the author's measurement on six skills of his own choosing, against a hand-built baseline he also chose. That is a worked example, not a benchmark. The mechanism is plausible and the release artifact is real, but the claim that this generalizes to every agent stack is the paper's bet, not a settled fact. There are no independent production adoption metrics for HarnessAPI, no named users, and no download trajectory in the public record yet. If a team adopts it, the right question is not "does it reduce my framework code by 74%" but "does my handler logic change shape when the schema is the source of truth?" The 74% is a useful reference number, not a guarantee.

What to watch. The interesting next move is whether the HarnessAPI repository gets external contributors, and whether the dynamic code-generation trick to feed Pydantic types into FastMCP's inspection layer becomes a pattern other frameworks adopt. The MCP standard is still young, the ecosystem is still in motion, and the most useful thing this preprint does is name a failure mode that gets misattributed to models. If your agent is misbehaving, the model is one suspect. The transport layer is another, and it is the one you can actually change.

It's Not the Model, It's the Plumbing: How a 74% Boilerplate Cut Exposes the Duplicated HTTP/MCP Tax on AI Agent Stacks — type0 | type0

It's Not the Model, It's the Plumbing: How a 74% Boilerplate Cut Exposes the Duplicated HTTP/MCP Tax on AI Agent Stacks

Sources