The humans paid to grade the next generation of AI are using AI to do it. That is not a punchline. It is the equilibrium a buyer built into the contract.
Multiple contractors who produce training data for large language models, the AI systems behind ChatGPT and similar tools, told New Scientist that they routinely outsource the work they are paid to do back to the very chatbots they are evaluating. The practice, described by one worker as "very widespread" across every company she has worked for, is exactly the inversion the industry's training pipeline cannot afford: the data meant to make models more human is being laundered through models that already exist.
The workers are not rogue actors. They are economic actors responding to the terms they have been given. AI firms increasingly route this work to third-party contractors, often part-time and paid piecework, with no full-time contract and no benefits. The platforms that employ them, including the AI-training contractor marketplace Outlier, ask them to hold long conversations with AI, write tests of model behavior, or grade AI outputs. Companies have explicit policies against using chatbots to do the work and use workforce-monitoring software such as Hubstaff, which takes periodic screenshots, to catch workers who do. The fact that the policy exists is itself the confession: the buyer has priced in the agency loss and is paying auditors to recover some fraction of it.
The mechanism is simple. A contractor paid by the task to produce a graded conversation is racing a throughput target. The fastest way to hit the target is to prompt a chatbot, strip the telltale phrasings, and submit. The buyer cannot easily verify the provenance of the output, because the deliverable is a string of text that looks like a human string of text. The contractor is the only party who knows the chain of custody, and the contractor's incentive is to break it. This is not a moral failure unique to AI data work. It is the standard piecework result: when the rate is too low to justify the labor, the labor migrates to the cheapest available substitute.
Bob*, another contractor who spoke to New Scientist, described a trajectory familiar to anyone who has watched a cheater-catcher operation mature: he used AI to complete the work, was identified, and was then promoted into the team that catches other workers using AI. The promotion is the market telling him, and his employer, that the cheat is now structural. The job has been split in two. There is the work, and there is the verification of the work, and the verification is the only part the buyer trusts.
The flagged second-order risk, raised by the contractors themselves and the researchers who study training pipelines, is a feedback loop. If human-curated training data is itself AI-generated, the next generation of models is being paid for with a provenance claim the supplier cannot fulfill and the buyer cannot verify. Researchers have called this dynamic model collapse in adjacent literature, and the source presents it as a risk to quality rather than a measured outcome. The honest framing is that the buyer is paying a premium for a category, "human-curated," that the unit economics of the contract are systematically converting into the thing it claims not to be.
What makes the cheat durable is that it is invisible to the buyer at the point of purchase and visible to the contractor at the point of production. Alice*, who spoke to New Scientist, offered the line that reframes the entire dispute: "If these companies want quality data, then they should offer quality contracts." The line lands because it is symmetric. It does not absolve the workers, and it does not condemn them. It places the agency where the contract placed the budget.
The policy implication is not that AI labs need better cheat-detection. They already have cheat-detection, and the cheat continues. The policy implication is that the unit price of human-curated training data is the variable, and the variable is set by the buyer. Until the rate is high enough to make honest labor the fastest path to the throughput target, the equilibrium rate of substitution will keep climbing, and the provenance claim printed on the invoice will keep drifting further from the provenance of the bytes delivered.
The next thing to watch is whether any frontier lab publishes a third-party audit of the share of its training data that was actually produced by the humans whose labor the data was sold as. Until that number exists, every "human-curated" label is a billing category, not a quality surface.