The Gamer Who Built a 3D AI Unicorn and the World Model That Remembers
Simon Song plays Civilization late into the night. The 29-year-old Forbes 30 Under 30 Asia alum is upfront about it — gaming is not a metaphor in his origin story, it is the origin. He spent his Johns Hopkins years playing strategy games, then went to SenseTime, then co-founded the large-language model company MiniMax, then left in 2022 to build what he actually cared about: a better way to make 3D content.
On Monday, his VAST — operator of the Tripo AI platform — closed a Series A+ and Series A++ round totaling roughly $200 million, according to people familiar with the deal. Backed by Ince Capital, Genesis Capital, Primavera Capital Group, and existing investors including Alibaba, the company is now valued at unicorn levels, Forbes reported. A VAST spokesperson declined to comment on valuation specifics.
The money is not the story.
The story is that Tripo AI — which turns text prompts and images into 3D objects — has crossed into production pipelines at companies like NetEase and Sony, while simultaneously building something that sounds like science fiction but is increasingly engineering reality: a world model that remembers.
Project Eden, announced alongside the funding, uses a three-layer architecture that separates the underlying state of a 3D environment — geometry, object identity, event logic — from the visual rendering. Objects persist when the camera looks away. Multiple users share the same evolving world state while each receiving an individually rendered view. It is the feature that sounds trivial when described and turns out to be the hardest problem in spatial AI.
The standard approach in generative video — whether from OpenAI, Runway, or Sora — produces convincing frames but cannot maintain a coherent world across time. Look away and return; the coffee cup that was on the table may have drifted, multiplied, or vanished. World Labs, the $1 billion spatial intelligence company founded by AI pioneer Fei-Fei Li, has made solving exactly this persistence problem its stated mission.
VAST claims to have solved it. The company has published its architecture and its research openly, with work accepted at CVPR, SIGGRAPH, and ICCV. Whether the claims hold under independent evaluation is the question the next phase of this race will answer.
The commercial infrastructure is more immediately verifiable. Tripo AI generates production-ready polygon meshes in as little as two seconds, according to the company — representing what it calls a 100x improvement over earlier workflows. The claim has not been independently benchmarked, and the gaming industry has historically treated AI 3D generation with skepticism that borders on reflexive. But VAST already has enterprise contracts with NetEase, where players in the RPG Where Winds Meet can upload photos to generate interactive 3D avatars, and with Sony, whose exact scope of use the company has not disclosed.
The company says it has 20 million users across the globe, with subscription plans starting at $20 per month and ranging to $140 for professional tiers. API clients number in the tens of thousands. In a September 2025 interview with VoxelMatters, Song said the business was profitable, running at roughly $1 million per month in revenue. User and client counts have grown substantially since then.
One smaller studio that has tested the platform is Humble Mill, an indie game developer in Vancouver. Founder Marcus Chen used Tripo AI to prototype environment assets for a roguelike project he builds solo. The speed was useful, he said, but the output still required significant manual cleanup before assets could enter a game engine pipeline. "It's a fast sketch tool, not a drop-in replacement for a 3D artist," Chen told type0. "Good enough for ideation. Not good enough to ship without work." His experience tracks with the broader industry posture: cautiously curious, not yet converted.
The competitive landscape is not empty. Tencent operates Hunyuan 3D. Silicon Valley startup Meshy serves Western markets. Luma AI competes on ease of use. NVIDIA's 3D tooling sits inside its broader ecosystem play. The question is not whether AI can generate 3D content — it demonstrably can — but whether the quality, speed, and pipeline integration are good enough for production use rather than prototype amusement.
For VAST, the bet is that the transition from novelty to infrastructure is happening now, and that the window for building a defensible position in 3D generation is narrower than it looks. The $200 million will fund research and recruitment. The real proof will be what ships in the next twelve months.