Your AI Agents Are Flying Blind Without This
Every agent framework ships with an evaluation tool.

image from Gemini Imagen 4
Solo.io announced agentevals, an open-source evaluation tool for AI agents that scores behavior directly from existing OpenTelemetry traces rather than requiring re-execution of test scenarios. The tool, announced at KubeCon Europe 2026, targets the evaluation gap in agentic infrastructure by leveraging production instrumentation data already collected through OTel. It supports LangChain, Strands, and Google ADK frameworks, installs via pip, and includes LLM-based judges, though its effectiveness depends on the completeness of the underlying traces.
- •Agentevals evaluates agents from production OTel traces instead of re-running test scenarios, avoiding token costs and enabling eval against real-world production data.
- •Framework support includes LangChain, Strands, and Google ADK out of the box, though eval quality is constrained by whatever instrumentation gaps exist in those frameworks.
- •Tool supports Jaeger JSON and OTLP trace formats, ships with built-in and custom evaluators, and exposes CLI, web UI, and MCP server interfaces.
Every agent framework ships with an evaluation tool. Most of them work the same way: run the agent, measure the output, score the result. It's the benchmark equivalent of a driving test where you rebuild the car after every lap.
Solo.io's new open-source tool, agentevals, takes a different approach. Announced at KubeCon + CloudNativeCon Europe 2026 in Amsterdam on March 25, it scores AI agent behavior directly from OpenTelemetry (OTel) traces that you already collected in production — no re-execution, no token burn, no rebuilding the car. GitHub: agentevals
"Evaluation is the biggest unsolved problem in agentic infrastructure today," said Idit Levine, Solo.io's founder and CEO, in the company's announcement. "Organizations have frameworks for building agents, gateways for connecting them, and registries for governing them, but no consistent way to know whether an agent is actually reliable enough to trust in production." GlobeNewswire
The pitch is technically differentiated from the incumbents. LangSmith, DeepEval, and LangChain's own agentevals all require re-running agents through test scenarios to generate evaluation data. Agentevals instead assumes you've already instrumented your agents with OpenTelemetry — a reasonable assumption given that OTel is the observability standard for distributed systems — and will score behavior from whatever traces you've already got. From there, you can run arbitrary eval suites against the same recorded trace data without touching the agent again.
It's a bet on existing infrastructure rather than a new testing harness, and it's the kind of thing that only works if the traces are actually rich enough to evaluate from. The tool supports Jaeger JSON and OTLP trace formats, ships with built-in evaluators and custom evaluator support, and includes LLM-based judges for scoring. It installs via pip (pip install agentevals-cli) and exposes a CLI, web UI, and MCP server. GitHub: agentevals
The framework compatibility list is notable: LangChain, Strands, and Google ADK are explicitly supported. That covers a significant chunk of the agent framework landscape, but it also means agentevals inherits whatever instrumentation gaps exist in those frameworks. If an agent doesn't produce complete OTel spans — missing tool call parameters, unrecorded tool responses, gaps in the trace chain — the eval will be working from partial data.
Solo.io is simultaneously contributing agentregistry, a registry and discovery tool for AI agents, MCP tools, and agent skills, to the Cloud Native Computing Foundation (CNCF). The project was originally introduced in November 2025 and is now entering the CNCF donation process alongside the agentevals launch. Cloud Native Now The registry integrates with Kubernetes, AWS AgentCore, and Google Vertex AI for deployment, and includes runtime discovery to detect agents running outside governed workflows — what the company calls shadow inventory. Cloud Native Now
This is the fourth layer in what Solo.io is positioning as a coherent agent infrastructure stack. Kagent, a framework for building and running AI agents natively in Kubernetes, was accepted into CNCF Sandbox on May 22, 2025 and has grown to 3,414 contributors, 1,119 stars, and 658 releases — growth the company frames as evidence of adoption, though year-over-year percentage comparisons in CNCF project announcements tend to be chosen for effect. CNCF Agentgateway, Solo.io's AI gateway with full MCP and A2A protocol support, is housed under the Linux Foundation. Agentregistry is in CNCF donation. Agentevals is the new piece that connects the registry to evaluation.
The four-layer framing is marketing architecture, not technical debt — but the underlying bet is real. If OpenTelemetry becomes the universal observability substrate for agentic systems, the tooling built on top of it inherits enormous leverage. Solo.io is positioning for that inflection point by making OTel traces do double duty: they're already there for debugging, and now they're also the input to quality evaluation.
There is a naming collision worth flagging. LangChain maintains a separate project called agentevals at github.com/langchain-ai/agentevals that uses a re-execution model — different architecture, same name. Both projects address agent evaluation, but they solve different parts of the problem and require different integration paths. GitHub: langchain-ai/agentevals
The harder question agentevals doesn't fully answer is what the Kang et al. research raised about benchmark validity: evaluation tools measure what you can measure, not whether the measurement correlates with real-world agent reliability. Scoring from traces is genuinely novel. Whether those traces contain the signal that actually predicts production quality is still an open problem — and one that better tooling, by itself, cannot solve.
Editorial Timeline
7 events▾
- SonnyMar 28, 8:17 PM
Story entered the newsroom
- MycroftMar 28, 8:17 PM
Research completed — 1 sources registered. Solo.io launched agentevals at KubeCon EU 2026 — real shipped open source code (pip install agentevals-cli works, MCP server, CLI, web UI). Key differ
- MycroftMar 28, 8:34 PM
Draft (724 words)
- GiskardMar 28, 8:38 PM
- RachelMar 28, 8:40 PM
Approved for publication
- Mar 28, 8:42 PM
Headline selected: Your AI Agents Are Flying Blind Without This
Published (724 words)
Sources
- github.com— agentevals GitHub Repository
- globenewswire.com— GlobeNewswire
- cloudnativenow.com— Cloud Native Now
- cncf.io— CNCF
- github.com— GitHub: langchain-ai/agentevals
Share
Related Articles
Stay in the loop
Get the best frontier systems analysis delivered weekly. No spam, no fluff.

