When the Audit File Becomes the AI Memory
Caseware launched Verity on May 20. The product automates routine audit tasks and generates citation-backed suggestions that enter the official engagement file: the regulated record that regulators and peer reviewers eventually examine. The question the launch doesn't answer: what exactly is human judgment when the AI's reasoning is inside the file too?
The audit engagement file is the closest thing an accounting firm has to a constitution. Every decision, every judgment, every note about why the auditor chose to accept or reject a particular piece of evidence lives there. It is what regulators read when something goes wrong, what peer reviewers examine during inspection season, what lawyers request in a malpractice case. The file is the proof that a human professional applied human judgment to a specific set of facts.
Caseware just made that file AI-native.
The Toronto-based audit software company launched Verity on May 20, an AI platform embedding workflow agents directly inside the engagement file rather than operating beside it. The Disclosure Checklist Agent reviews financial statements and generates citation-backed suggestions that auditors accept, refine, or override within the workflow. The Risk Suggestion Agent draws on multi-year financial data and qualitative materials to propose engagement-specific risks. Every output is traceable before it enters the official record. Caseware invested more than $100 million developing the system over several years, and says early deployments reduced certain manual workflows from 15 to 20 minutes to under two minutes. The company is aiming to automate half of all repetitive engagement elements end-to-end.
That is the product. Here is the question it raises: when an AI agent leaves its reasoning inside the engagement file, what exactly is the human judgment being audited?
The Sidecar Problem
Audit firms are under pressure to adopt AI. The talent pipeline is thin, workloads are heavy, and the Big Four have collectively posted more AI-specialist job listings than auditor positions in English-speaking markets, according to Financial Times analysis of help-wanted data. The incentive to embrace AI-assisted tools is real. The problem is that every major enterprise AI product in the audit space has treated the engagement file as something to protect from the AI, not something to integrate it into.
Copilots sit beside the workflow. Assistants orbit the document. The audit work itself happens in one system; the AI contributes from a comfortable distance, its output reviewed by humans before anything enters the regulated record. This architecture keeps the AI at arms length, which keeps the accountability clean: human reviewed it, human approved it, human is responsible for it.
Caseware's architecture inverts this. Verity's agents don't contribute suggestions that humans then incorporate. The agents generate outputs that enter the file itself, citation-backed and traceable, as part of the official engagement record. The human reviews and can override, but the AI reasoning is now part of the artifact that regulators and peer reviewers will eventually examine.
The practical difference sounds subtle. The accountability implications are not.
The Regulatory Gap
The PCAOB has acknowledged exactly this tension. In remarks before an industry conference, the board's staff described a hypothetical: a firm uses AI to test 100 percent of journal entries rather than sampling. Inspectors, lacking clear standards, demand extensive documentation about the AI model's behavior. The firm decides manual sampling was less risky from a PCAOB compliance standpoint and goes back to it. The staff illustration: unclear guidance on acceptable AI-based audit methodology is an anchor on innovation, not an engine for it.
The PCAOB's Technology Innovation Alliance Working Group explicitly discussed agentic AI auditing as a forward state in its recommendations to the board. One pillar recommends the PCAOB develop risk management guidance specifically to help firms responsibly use AI in auditing. The working group also proposed standardized audit documentation structures that could support continuous audit technology ecosystems and make agentic AI adoption easier to inspect and regulate.
That was the PCAOB talking about the theoretical future. Caseware shipped the product on May 20. The regulatory framework that would define what Verity's outputs mean for audit quality, peer review, and legal liability does not yet exist.
What the Beta Numbers Mean
Caseware says Verity achieved 94 percent accuracy on golden dataset items captured from beta traces in early deployments. The Disclosure Checklist Agent saves an average of 2.7 hours of review time per workflow, according to the company's own reporting.
These figures come from Caseware's beta program, with users the company selected and quotes it provided to journalists. No independent researcher has published an evaluation of Verity's accuracy on real-world audit tasks. The golden dataset was built from beta traces, which means the test cases reflect the conditions Caseware's own deployment team observed, not necessarily the full range of messy, ambiguous, incomplete audit situations that occur in practice.
The numbers are directionally consistent with what a well-resourced team with years of development and substantial beta testing could reasonably achieve. They are not independent validation. The 94 percent figure is worth citing, but readers should understand what it is: Caseware's own accuracy measurement, beta deployment conditions, no independent verification.
The Liability Question Nobody Has Answered
The audit profession carries professional liability insurance structured around the assumption that the work product reflects human professional judgment. Malpractice policies, peer review standards, and state board disciplinary frameworks were all built around a model where the accountable professional is a person, and the documentation reflects that person's reasoning.
When an AI agent generates a citation-backed risk suggestion that enters the engagement file, and an auditor accepts it, the reasoning inside that file is a compound artifact: some of it is the auditor's judgment, some of it is the model's. The current frameworks do not specify how that compound reasoning should be evaluated, who bears responsibility for errors in the AI's contribution, or how peer reviewers should assess the quality of work that included AI-generated analysis.
Insurers have not published guidance on how AI-assisted audit work changes coverage structures. The PCAOB's quality control standard QC 1000, which would have addressed some of these questions, was postponed last year to December 2026, after Verity's launch, not before it.
Caseware has built what appears to be the most technically rigorous attempt to solve the audit AI accountability problem in the current market. The compliance certifications are real: SOC 2 Type II, ISO 27001, engagement-level data isolation. The citation-and-traceability architecture is more thoughtful than most enterprise AI products in regulated fields. But the legal and professional liability frameworks that would define what Verity's outputs mean in a courtroom or a PCAOB inspection have not caught up to the product.
That gap is not a flaw in Caseware's launch. It is the story.
Caseware declined to name beta clients or provide documentation of the golden dataset methodology. A spokesperson said the 94 percent figure reflects early deployment conditions and that the company expects to publish more detailed accuracy data as the product scales.