THE NEW YORK TIMES: A word to the wise — don’t trust artificial intelligence to file your taxes

AI is used by the world’s military to operate sophisticated drones. It has replaced thousands of coders at the most advanced technology companies. Just don’t, whatever you do, use it to file your taxes.

Stuart A. Thompson
The New York Times
AI is used by the world’s military to operate sophisticated drones. It has replaced thousands of coders at the most advanced technology companies. Just don’t, whatever you do, use it to file your taxes.
AI is used by the world’s military to operate sophisticated drones. It has replaced thousands of coders at the most advanced technology companies. Just don’t, whatever you do, use it to file your taxes. Credit: tungnguyen0905/Pixabay

Artificial intelligence is used by the world’s military to operate sophisticated drones. It has replaced thousands of coders at the most advanced technology companies. It is even upending how cancer patients are treated, potentially saving lives.

Just don’t, whatever you do, use it to file your taxes.

To assess the technology’s ability to file a federal income tax return, The New York Times tested four AI chatbots — Google’s Gemini, OpenAI’s ChatGPT, Anthropic’s Claude and xAI’s Grok — to see how well they fared with eight fictional tax situations written as part of training materials by TaxSlayer, a tax-filing service.

Sign up to The Nightly's newsletters.

Get the first look at the digital newspaper, curated daily stories and breaking headlines delivered to your inbox.

Email Us
By continuing you agree to our Terms and Privacy Policy.

They struggled, hard, miscalculating the refund or amount owed to the IRS by an average of more than $2,000. Even when provided with all the necessary materials, including all the forms they needed to fill out, the chatbots whiffed on some calculations.

“The problem with taxes is all those very small little details matter, and it’s not going to get every single little detail right,” said Benedict Evans, an analyst who writes a technology newsletter.

“These models get dramatically better over the course of every six months,” he added. “But they still give you what is roughly the right answer, and that’s not what you want.”

(The Times has sued OpenAI and its partner, Microsoft, claiming copyright infringement of news content related to AI systems. OpenAI and Microsoft have denied those claims.)

The problem comes down to how AI chatbots are fundamentally designed: They do not truly understand the complex relationships among the pieces of information they are processing.

Their power to predict the next appropriate word in a sequence makes them smart in some areas — like reading and writing — but leaves them exceptionally weak in others — like actively remembering a lot of interconnected information without errors sneaking into their responses.

Those weaknesses prove tricky for filing taxes, which can require dozens of forms that inform one another and need to be updated in a specific sequence. AI tools struggle to follow complex procedures perfectly, and errors can accumulate as a task becomes more complex.

The issue amounts to a “tax-code paradox,” said Erik Brynjolfsson, a senior fellow at the Stanford Institute for Human-Centered AI. The shortcoming reflects much larger challenges that AI companies are facing in expanding the tools into all areas of life.

“Traditional tax software like TurboTax is procedural, following ‘if-then’ logic built for mathematical precision,” Ms Brynjolfsson said of existing online filing tools. Large language models, by contrast, are prediction engines that “can be superhuman at many tasks yet fail at some that seem simpler to humans.”

The chatbots did better in our tests when we gave the most advanced models a very organised picture of a fictional user’s finances, including sorting every piece of information by the corresponding IRS document they should have used and then uploading those documents.

But most people don’t file their taxes this way; they don’t know what documents to use or what claims to make. Modern tax software asks filers about their life — whether they have children in day care, for example, or use a car for work — then transfers that information directly into the correct forms.

Chatbots struggle with this kind of operation. Without specific instructions, they can only surface what is probably the most relevant information, which might not be what you need.

“If you ask it how many R’s are in ‘strawberry,’ it tells you how many R’s are probably in ‘strawberry,’” Evans said.

AI optimists hold a different view about where the tools might go next. They argue that the existing tools may begin to “think” more clearly through complex problems like taxes, applying active reasoning to find their way through IRS documents.

Adding tools on top of the chatbots — like a program that could validate whether the tax return passes all the IRS rules — could give them the help they need to get things right.

That is similar to how AI chatbots have learned to code: They occasionally program things incorrectly but are good at understanding errors and coming up with fixes.

Claude, the AI chatbot from Anthropic, shows its “thought process” in real time. When we asked it to calculate how much a fictional user owed in federal taxes, it determined it needed a form from the IRS that it didn’t have. It described the need to fetch it from the internet and then did just that, downloading the form and completing the math required.

In that case, Claude got the tax refund correct. But it made many errors in other tests, including calculating a lower refund than the fictional person was owed.

Tax experts have suggested that the tools are still a helpful assistant to use alongside manual research. When we asked the chatbots simple tax questions, or asked the chatbots to describe in familiar language a complex IRS form, they performed well.

Everyday tax filers — and even professionals — can make plenty of mistakes when navigating the tax code on their own, too.

But experts have emphasised keeping humans in the loop.

This article originally appeared in The New York Times.

© 2026 The New York Times Company

Originally published on The New York Times

Comments

Latest Edition

The Nightly cover for 12-03-2026

Latest Edition

Edition Edition 12 March 202612 March 2026

Government lowers fuel standards to address petrol shortage it denies exists.