Google Ends the License Drama That Drove Developers Away
Google's open-weight model strategy just got a lot less restrictive. Gemma 4, released this week, ships under Apache 2.0 — a commercially permissive license that replaces the custom Gemma terms that governed earlier releases. For developers and companies that sidestepped Gemma 3 because of licensing uncertainty, this removes the last reason to look elsewhere.
The license change arrives alongside genuine technical progress. Gemma 4 introduces native function calling at the model level — not as a bolted-on tool wrapper, but as a first-class capability the model was trained to perform. Combined with structured JSON output and native system instructions, the family is designed for agentic workflows from the ground up, not retrofitted for them. The 31B variant currently ranks #3 on the Arena AI text leaderboard among open models, behind GLM-5 and Kimi 2.5. The 26B mixture-of-experts model, which activates only 3.8 billion of its total parameters during inference, sits at #6 — outperforming models twenty times its size on the leaderboard.
The edge story is where it gets concrete for hardware-constrained deployments. The E2B model — effective 2 billion parameters with a 128K context window — runs at 133 tokens per second prefill on a Raspberry Pi 5, with decode throughput of 7.6 tokens per second. That is not a demo number; it is fast enough for interactive use on a $50 single-board computer. The E4B variant delivers similar performance at 4 billion effective parameters. Both models process text, images, and audio natively, and are forward-compatible with Gemini Nano 4 via the AICore Developer Preview on Android.
On the infrastructure side, NVIDIA has listed OpenClaw as a featured agent platform compatible with Gemma 4 across RTX PCs, workstations, and the DGX Spark personal AI supercomputer. The NVIDIA blog post links to a specific OpenClaw for RTX GPUs and DGX Spark guide, alongside a DGX Spark OpenClaw playbook on NVIDIA's model hub. This is not a peripheral mention — it is a prominent placement in NVIDIA's Gemma 4 coverage.
Architecturally, Gemma 4 introduces Per-Layer Embeddings (PLE), a second embedding table that feeds a small residual signal into every decoder layer, and Shared KV Cache, which lets the last N layers reuse key-value states from earlier layers, reducing redundant KV projections. Hugging Face's technical analysis notes that these choices make the models well-suited for quantization — a critical property for deployment on memory-constrained edge hardware.
The download history tells its own story. Since the first generation, developers have pulled Gemma over 400 million times, building more than 100,000 variants on Hugging Face.* That ecosystem did not materialize because the models were good — it happened because they were open. The Apache 2.0 switch signals that Google understands the difference, and is betting that the community will build more aggressively on Gemma 4 than it did on its predecessors.
The 26B MoE architecture deserves a closer look from anyone running inference at scale. Activating 3.8 billion parameters for inference while maintaining dense-model-quality outputs is a meaningful efficiency gain. If the benchmark numbers hold in production, it is a better tradeoff than running a full 26B dense model for most agentic tasks.
For builders: the combination of native function calling, Apache 2.0 licensing, and sub-10-token decode speeds on Raspberry Pi 5 is not a coincidence. Google has been paying attention to what the open-source agent community has been asking for. The question is whether Gemma 4's license clarity arrives early enough to capture the wave of agentic deployments that are now economically viable on edge hardware.
Gemma 4 download and variant count figures are from Google's announcement post.
† Add † footnote: "Source-reported; not independently verified." Alternatively, verify against Hugging Face's official Gemma model page or blog post for exact figures.