The Open-Source AI Test: Download Tencent's Hy3, See If It Holds Up
Tencent released its first flagship AI model on Thursday, and unlike most announcements from major labs, this one comes with a quirk: you can download the weights and test it yourself.
Hy3, formally Hunyuan 3.0 Preview, is a 295-billion-parameter mixture-of-experts model that activates 21 billion parameters per forward pass. The numbers are notable on their own. On SWE-bench, a benchmark that tests whether an AI model can resolve real software engineering problems from open-source repositories, Hy3 scored 74.4 percent, up from 53 percent for the prior generation, according to the company's model card on HuggingFace. On LiveCodeBench-v6, a continuously evaluated coding benchmark, the model posted 34.86, which Tencent claims is the highest in their comparison table.
Training began at the end of January 2026 and shipped April 23, a sub-three-month cycle that stands out against typical flagship development timelines. The project is led by Yao Shunyu, who joined Tencent as chief AI scientist in late 2025 after leaving OpenAI, where he worked on the GPT-4 team.
What makes this different from most model announcements is the open-source approach. Tencent posted the Hy3 preview weights on HuggingFace for free download, and the model is also accessible via OpenRouter under the name Hy3preview. Most frontier model releases from major labs stay behind APIs or wait for staged weight releases; Tencent shipped open weights on the same day the announcement went out.
The technical configuration is a deliberate choice in the scaling debate. Rather than push toward trillions of parameters, Tencent built a model with 192 experts, activating eight per token, sitting at the 295-billion-parameter total. Yao has said publicly that the sweet spot for the current generation of hardware sits around 300 billion parameters, a view that puts Tencent explicitly on one side of an ongoing argument in AI research about where the efficiency gains are.
The benchmark results Tencent is most proud of concern agentic workflows, the kind of multi-step tasks that matter for real applications rather than standardized tests. According to Jianshi App, which spoke with people familiar with the internal deployment, Hy3 has powered agent workflows of up to 495 steps in Tencent's own products, with a reported success rate above 99.99 percent on tasks using the company's CodeBuddy and WorkBuddy tools. First-token latency dropped 54 percent compared to the prior generation, and end-to-end task duration fell 47 percent.
The SWE-bench number is where independent data exists. The benchmark's validated leaderboard shows Tencent's 74.4 percent sits between QWQ-32B at 75.6 percent and o1 at 74.4 percent, which is a competitive position relative to both open-source and frontier closed models. The LiveCodeBench-v6 score of 34.86 is self-reported against a table Tencent compiled, not a public leaderboard, which means it cannot be independently verified without running the same evaluation suite.
The OpenClaw benchmark numbers are the most curious gap. Tencent cited agentic performance figures from OpenClaw's own evaluation framework, which is an independent benchmark run by OpenClaw, not Tencent. That is a meaningful citation structure: an external evaluator rather than internal testing. But the specific numbers were provided by Tencent in their announcement, not pulled directly from the OpenClaw leaderboard, and OpenClaw did not independently publish results for Hy3 as part of the model release.
Yao's trajectory is what makes the timeline surprising. He spent years at OpenAI on reasoning and agentic systems before leaving, and joining Tencent in late 2025 meant inheriting an existing Hunyuan codebase and team. The sub-three-month training run suggests Yao did not start from scratch but rather built on an existing foundation. That context matters for the question the industry is quietly asking: can you transplant AI capabilities by hiring the people who built them, or does the moat live in the organization, not the individuals?
The answer, for now, is that Hy3 posts competitive numbers and the weights are available to test that claim. Whether the performance holds outside Tencent's own infrastructure is exactly the kind of question open weights exist to answer. The model card is live, the benchmarks are published, and the code is downloadable. The verification is not done. It is just available.