NVIDIA AI-Q Hits #1 on DeepResearch Bench I and II

PREVIEWNVIDIA AI-Q Hits #1 on DeepResearch Bench I and II · MD

NVIDIA's AI-Q deep research agent has claimed the top spot on both DeepResearch Bench I and DeepResearch Bench II, according to the official leaderboards for both benchmarks.

The system scored 55.95 on Bench I and 54.50 on Bench II, making it the first open blueprint to lead both benchmarks simultaneously. The results were added to the Bench I repository on March 8, 2026.

What AI-Q is: According to NVIDIA's technical documentation, AI-Q is an open blueprint for building AI agents that reason over enterprise and web data to deliver well-cited responses. The deep researcher component — the part that scored on these benchmarks — uses a multi-agent architecture with three core components: an orchestrator that coordinates the research loop, a planner that maps the information landscape, and a researcher that dispatches parallel specialists to gather and synthesize evidence.

The stack runs on NVIDIA NeMo Agent Toolkit and LangChain DeepAgents, powered by fine-tuned Nemotron 3 Super models. Enterprises can inspect, customize, and configure the system per their use case.

Why it matters: DeepResearch Bench I evaluates report quality against reference reports across comprehensiveness, depth of insight, instruction-following, and readability. DeepResearch Bench II uses over 70 fine-grained rubrics per task to check information retrieval, synthesis, and presentation. Leading on both suggests AI-Q produces both polished narratives and granular factual correctness.

NVIDIA frames this as "a meaningful step for open, portable deep research," arguing that developer-accessible models and tooling can power state-of-the-art agentic research.

The open question: Independent validation of how these benchmark scores translate to real-world enterprise research performance remains untested. Benchmarks measure what models can do in controlled conditions; what they do in messy, real enterprise environments is a separate question that the leaderboard data alone cannot answer.

This article synthesizes NVIDIA's technical documentation and Hugging Face reporting with direct verification against the live DeepResearch Bench I and II leaderboards. The benchmark scores were confirmed against the official repositories.

NVIDIA AI-Q Hits #1 on DeepResearch Bench I and II — type0 | type0

NVIDIA AI-Q Hits #1 on DeepResearch Bench I and II

Sources