The AI Employee Experiment Is Failing: New Research Shows Treating AI as Colleagues Reduces Error Detection by 18%
When researchers framed AI as a colleague rather than a tool, error detection fell 18 percent and escalation requests rose 44 percent. That is the result of a randomized experiment published in Harvard Business Review in May across 1,261 managers in HR and finance roles. It is also, according to data presented at Fortune's COO Summit this week, the model already in use at roughly a third of Fortune 500 companies.
The colleague-versus-tool debate that played out on stage — Okta's President and COO Eric Kelleher naming his agents Leo, Sloan, Hank, and Walker and putting them in business reviews alongside human staff, Cisco's EVP of People arguing the colleague frame is categorically wrong — looked like a genuine disagreement. It is not. The research already has an answer. What the executive debate actually exposed is something the productivity statistics have been trying to tell us for two years: the organizational infrastructure to manage AI-assisted work has not caught up with the technology itself.
Cognizant calls this the activation gap. Their New Work, New World 2026 report, released at the same summit, reassessed 18,000 tasks across 1,000 occupations using the O*NET database. Ninety-three percent of jobs are already being disrupted by AI, six years ahead of Cognizant's own 2023 forecast. The researchers estimate $4.5 trillion in US labor costs are theoretically exposed to AI shift. Yet the productivity gains that were supposed to follow have not materialized. The technology moved faster than the organization.
The HBR experiment explains part of why. When workers treat AI as a colleague rather than a tool, they perform less rigorous oversight. The AI employee framing did not just change attitudes — it changed behavior in ways that reduced quality control and increased pass-through review. Participants in the AI colleague condition reviewed work less carefully and escalated more often, not because they were more diligent but because the framing gave them permission to offload judgment onto the system.
Thirty-one percent of the managers in the HBR study said their company already frames AI as a teammate or employee. Twenty-three percent said their organization lists AI on its official work charts. This is happening across healthcare, financial services, retail, and professional services — not just in technology companies.
Kelleher's instinct to name the agents and put them in reviews is understandable and may serve a real organizational purpose. But the HBR data suggests that what he is really doing is shifting accountability in ways that are difficult to reverse. Once an entity has a name and an org chart slot, it becomes easy to narrate failures as its fault rather than the fault of the humans who deployed or approved its outputs.
Kelleher's actual innovation is not the naming — it is pushing token budgets down to individual managers, forcing a concrete reckoning with digital labor alongside human labor in the budget cycle. The harder problem, as he frames it, is that managers have been trained for decades to think about headcount and payroll, not hybrid workforces. That training gap is where the real cost of the colleague framing lives.
Cisco's experience adds a dimension the productivity data cannot capture. During the same summit, EVP Francine Katsoudas described how Cisco handled 4,000 announced layoffs as part of an AI restructuring. According to Fortune, on the teams using AI most effectively, trust within those teams began to drop about nine months in. The company responded by investing more in communication and internal redeployment — pairing training with redeployment allowed Cisco to place 75 percent of impacted employees in previous restructurings. But the trust erosion on the high-AI teams was not a technology failure. It was an organizational one.
Wayfair President Jon Blotner described a different response: reversing a top-down AI mandate and giving every employee access to Claude, Gemini, and ChatGPT, then watching teams begin reinventing their own roles. According to Fortune, the employees who automated their own workflows became, in his framing, incredibly valuable. That model does not resolve the accountability question — it sidesteps it by distributing agency broadly rather than formalizing it in the org chart.
The deeper issue is that the colleague-versus-tool debate is the wrong question to be having in public. The research suggests that what companies call AI in their org charts matters far less than what they hold humans accountable for when the AI makes a mistake. The companies getting this right are not the ones with the most sophisticated AI or the most progressive naming conventions. They are the ones that have rebuilt their oversight structures so that the humans in the loop actually check the work — regardless of what the system is called.
The activation gap will not close until the accountability gap does.