The IRS now runs audit selection with machine learning. Small accounting firms are first to feel it.

The IRS now runs audit selection with machine learning. Small accounting firms are first to feel it. — type0 | type0

PREVIEWThe IRS now runs audit selection with machine learning. Small accounting firms are first to feel it. · MD

The Internal Revenue Service reviewing this year's returns is not the one your clients filed against twelve months ago. Audit case selection, third-party data matching, and fraud detection have moved from planning decks into production systems, and the small accounting firms serving individuals and pass-through entities are the first ones who need a new playbook.

The driver is structural. A Government Accountability Office inventory of IRS AI use and its June 2025 supplemental product document an agency that runs an order of magnitude more AI use cases in 2025 than it did in 2022, alongside skills gaps and information-quality problems inside the agency. The IRS workforce has contracted by roughly a quarter over the same period, according to industry analysis of agency staffing data, which means machine learning is filling slots that humans no longer occupy rather than augmenting a stable team. That distinction matters for how clients should be advised, and it matters for what the system can and cannot be trusted to do well.

Case selection: from DIF scores to machine learning

The IRS is retiring its legacy statistical scoring system, the Discriminant Information Function (DIF), in favor of machine-learning models for choosing which returns to examine. DIF was a rules-based scoring engine that had accumulated a documented no-change problem: a high share of returns it flagged produced no adjustment after audit, meaning the agency was spending examiner time on returns that did not need it. The Treasury Inspector General for Tax Administration recommended in a 2025 audit that the IRS use actual examination results as training data for the new models and improve how it evaluates their performance. The GAO's AI-and-the-tax-gap blog frames the broader goal: closing the gross tax gap, estimated in the hundreds of billions of dollars, by catching more underreporting earlier and more precisely.

For practitioners, the practical consequence is that audit risk is no longer a fixed score on a return. It is the output of a model that weighs third-party data, prior-year patterns, and anomalies. The trigger threshold has changed, even if the trigger list has not, and the basis for selection is now harder for an outsider to reconstruct than a DIF score cutoff ever was.

Third-party data matching at scale

The second production change is the scale of automated cross-checking. The IRS now uses machine learning to compare what taxpayers report on their returns against information statements filed by employers, payment processors, and other third parties: W-2 wage forms, 1099-NEC contractor payments, 1099-K payment-card and third-party network transactions, and similar documents. When the return and the third-party data do not align, the system generates a CP2000 notice, an underreported-income letter that has been a fixture of automated enforcement for years.

What has changed is the volume and the speed. Practitioner reporting on IRS staffing and AI use and a Nexairi practitioner guide on AI audit selection describe an expansion in both. A mismatch that would once have sat in a queue can now generate a notice inside the same filing season. That compresses response windows for clients and raises the stakes for reconciliation work done before the return is filed, not after the notice arrives.

Fraud detection and identity verification

The third production change is fraud detection. The IRS is using AI to flag fraudulent returns before refund issuance, to verify taxpayer identity at filing and through online accounts, and to score compliance risk across categories. Three taxpayer segments are named in Accounting Today's analysis as the crosshairs for the new enforcement logic: high-income earners, partnerships and other pass-through entities, and users of digital assets. The first two are longstanding enforcement priorities; the third is newer and reflects IRS attention to cryptocurrency transactions and the reporting rules attached to them.

The accuracy of these models is not a settled question. The GAO has flagged information-quality and skills gaps inside the IRS that affect model reliability. TIGTA's 2025 audit recommends the agency measure the performance of these models against actual examination outcomes rather than internal benchmarks alone. Practitioners advising clients inside the system should expect both higher flagging rates in the named categories and a more contested appeals environment if those flags are challenged.

What changes in the practitioner workday

Four operational moves follow directly from the changes above.

Tighten the reconciliation step before filing. Third-party data matching will run on the same timeline as the return, so any unreconciled W-2, 1099-NEC, or 1099-K should be resolved with the client before submission rather than after a CP2000 arrives.

Document the basis for positions taken on returns in the named high-risk categories. If a partnership reports a loss, a high-income taxpayer takes a position on a Schedule item, or a digital-asset transaction is reported, the workpapers should support the position in language an examiner and a model can both follow. Professional society guidance for practitioners increasingly treats documentation as the first line of defense when an AI-flagged return is selected for examination.

Prepare clients for faster, more frequent notices. CP2000 timing, identity-verification holds on refunds, and information-document inquiry letters are all flowing on shorter cycles. A client who expects a six-month quiet period after filing is using the wrong mental model for this IRS.

Treat model-driven selection as contestable. The TIGTA and GAO work makes clear that the IRS's own oversight bodies have flagged performance and information-quality issues in the new systems. A flagged return is not an automatic adjustment; it is the start of a process, and a documented position remains the practitioner's strongest asset.

The IRS that existed when many practitioners started their careers is gone. The new one runs case selection, data matching, and fraud detection on machine-learning infrastructure, with a smaller human staff and a documented set of model risks. Small-firm practitioners cannot change the system, but they can change how their clients enter it. That is the work for this filing season and the next one.