The most interesting part of Current AI’s Open Source AI Gap Map v0.1 is not the map. It is the MIT-licensed repository behind it: 1,184 YAML files, a CSV of 16,185 GitHub repositories, the notebooks, and the schema used to gather everything. The interactive visualization makes those numbers legible to a casual reader. The dataset makes them argument-proof.
Current AI, a non-profit founded at the February 2025 AI Action Summit in Paris with roughly $400 million in committed capital, presents the Gap Map as part of a broader public-option framing for AI. That framing is company-stated, not independently audited, and the article should lead elsewhere. The schema question is what outlives any single launch post.
In v0.1, Current AI scored 421 open source AI products in depth across 14 categories and three stack layers (model components, product / UX, and infrastructure). The breakdown, drawn from Simon Willison’s read of the release, is 266 software tools and libraries, 85 models, 50 datasets, and 20 hardware projects, produced by 228 organizations. Outside the scored 421 sits a long tail of about 24,400 artifacts that the project is tracking but has not yet researched or cited. They are in the CSV, not the map.
That asymmetry is the mechanism worth covering. A taxonomy with explicit gaps is more honest than one without them. Because the raw data is open, outside researchers can compare the scored 421 against the 24,400 uncategorized tail, audit which categories carry the weight, and propose new ones without asking Current AI for permission. The AGENTS.md in the repo signals the project expects that: it instructs downstream agents how to consume and extend the catalog, treating the schema itself as infrastructure.
Three design choices shape what the map can and cannot reveal. First, the three-layer split (model components, product / UX, infrastructure) is a 2026 reading of the AI stack. Hardware ends up its own category within infrastructure, and only 20 of the 421 deeply scored entries are hardware projects, a hint that v0.1 is software-heavy rather than balanced. Second, the boundary between “scored in depth” and “in the long tail” is the project’s, not the ecosystem’s. The 24,400 carry no score until they are researched and cited, which means the long tail is visible without being authoritative. Third, the data is MIT-licensed, not just open: the methodology page and the repo’s README point anyone building on it back to a permissive license that allows commercial fork, redistribution, and reuse without registering with the non-profit.
For a toolbuilder, the immediate use is filtering. The catalog CSV is already loadable into Datasette Lite, so a builder looking for production-ready inference runtimes, evaluation harnesses, or model-serving frameworks does not need to wait for a curated filter; they can run their own. For a researcher, the YAML files are the more useful artifact: each scored entry carries the metadata Current AI used to make the call, which becomes a baseline for replicating or contesting those calls. For a policymaker, the long-tail CSV is a candidate starting point for measuring concentration, where dependencies stack, and where commercial open source actually meets non-profit infrastructure.
The watch item is pace. v0.1 launched days before Willison’s 3 July 2026 link blog surfaced it. Open source AI moves in patch releases; a taxonomy that updates monthly will be useful, one that updates quarterly will be a relic. Current AI has not announced a release cadence. The non-profit’s own framing positions the project as ongoing infrastructure, but the v0.1 label signals the maintainers know v0.2 will look different.
That is also why the dataset matters more than the screenshot. A frame is a moment. A repo is a thing someone else can patch.