Andrej Karpathy hasn't written a line of production code since December. He doesn't consider this a loss.
"I kind of went from 80/20 to like 20/80 of writing code by myself versus just delegating to agents," Karpathy said on the No Priors podcast, released March 30. "I don't even think it's 20/80 by now. I think it's a lot more than that. I don't think I've typed a line of code probably since December."
That sentence will get picked up and passed around because it sounds extreme. It is extreme. But the interesting thing about the No Priors conversation is that Karpathy, the OpenAI cofounder and former Tesla Autopilot lead, has the receipts to back it up—and the intellectual honesty to show where the receipts run out.
The most concrete evidence is his overnight auto-research run. Karpathy's auto-research project, a system that submits batches of experiments to a GPU cluster and evaluates results without human intervention, ran 126 trials while he slept. It drove validation bits-per-byte on his nanoGPT model from 0.9979 to 0.9697—a 2.8 percent improvement in a codebase he'd already spent years hand-tuning. The largest single find: weight decay on value embeddings, a hyperparameter Karpathy admitted he had simply forgotten to set. "I did forget the weight decay on the value embeddings and my atom betas were not sufficiently tuned," he said.
Scaling to a 16-GPU cluster, the same system ran roughly 910 experiments in eight hours, according to SkyPilot's writeup of the setup. The cluster found that scaling model width mattered more than any single hyperparameter knob. Val_bpb dropped from 1.003 to 0.974, a 2.87 percent improvement. Karpathy called it a proof of concept for removing himself from the research loop entirely.
The labor market numbers give the anecdote a different weight. Employment for software developers aged 22 to 25 declined nearly 20 percent since ChatGPT's release in late 2022, according to a Stanford Digital Economy Study published in August 2025. Overall U.S. programmer employment fell 27.5 percent between 2023 and 2025, per Bureau of Labor Statistics data cited by Forbes. Job postings for entry-level and junior developer roles have dropped roughly 40 percent compared to pre-2022 levels. Karpathy, asked about the junior developer cohort, was direct: "I think this is table stakes. This is like any AI, even the open source models, can like do this. You should be able to translate from a less technical human's intent very easily to this outcome."
The joke is where the ceiling shows.
Karpathy's most telling moment came when he described what happens when you ask a state-of-the-art model to be funny. "If you go to like state-of-the-art model and you ask it, tell me a joke. Do you know what joke you're going to get? There's the joke. I can't tell you like the standard form of it, but I do feel like ChatGPT has like three jokes." The specific joke—why do scientists not trust atoms? Because they make everything—appeared in Forbes's reporting on the same interview, where Karpathy noted it's the same joke from three to four years ago. "It's outside of the reinforcement learning. It's outside of what's being improved. It's part of the jaggedness."
This is not a minor observation. It's the clearest articulation of what Karpathy calls "jagged intelligence"—the property where models are brilliant inside verifiable domains (code, math, formal reasoning) but meandering and stuck outside them. Humor is one data point. Asking clarifying questions is another. "I think they have a tough time with nuance of maybe what I had in mind or what I intended and when to ask clarifying questions," Karpathy said. "Anything that feels softer is worse."
The implication cuts both ways. If intelligence is jagged, then the things AI does well are genuinely, durably well—and the things it does poorly may stay poor for a long time, no matter how much overall capability improves. The joke doesn't get better because there's no reward signal for jokes. No unit test for humor. The RL-verifiable domains get optimized; everything else plateaus.
Karpathy has his own answer to this: microGPT, his ongoing obsession with boiling a GPT model down to its absolute essentials. The result is roughly 200 lines of Python with no dependencies—no PyTorch, no CUDA, just the algorithm. Dataset loading, tokenizer, autograd engine, GPT-2 architecture, Adam optimizer, training loop, inference. He asked an AI to explain it and found the agent could follow the code perfectly. "It can't come up with it, but it totally gets it and understands why it's done in a certain way." His contribution, he says, is the 200 lines. The rest is already commodity.
He also built Dobby.
Dobby is Karpathy's home automation system, named after the elf from Harry Potter. It runs on a local language model with a camera pointed at his driveway, a change-detection layer, and a vision model that watches for visitors. FedEx pulled up last week and Dobby sent him a WhatsApp message with an image: "Hey, a FedEx truck just pulled up." It controls his Sonos speakers, lights, HVAC, shades, pool, spa, and security system. He decommissioned six separate apps. "Instead of six apps, I just text Dobby."
On open source, Karpathy gave a unusually specific timeline. Closed frontier models are ahead, but the gap has compressed. "It started with there's nothing, and then it went to 18 months. Yeah, but even convergence, right? So maybe they're behind by like, what is the latest, maybe like eight months, eight months kind of thing right now." He compared it to Linux: the closed systems are Windows and Mac OS, but Linux runs most of the internet because the industry needs a common open platform. "I think the same is true now."
The question of why he left the frontier labs received the longest answer of the interview. "I feel like a bit more aligned with humanity in a certain sense outside of frontier lab, because I don't—I'm not subject to those pressures almost, right? And I can't say whatever I want." He described the internal pull of staying close to what labs are building, but also the cost: you can't be a completely free agent inside one. "When the stakes are really high, if you're an employee at an organization, I don't actually know how much sway you're going to have on your organization."
The auto-research run is the part worth sitting with. A single researcher, sleeping, watched by a system that ran more experiments overnight than he would have run in months. It found something he'd missed. The code it improved was his own. And the metric it optimized—a lower bits-per-byte score on a language model—is exactly the kind of thing that RL-verifiable domain that makes this all work.
Outside that domain, the joke is still the same.