AI's Confidence Zone: Why Experts Get It Wrong Outside It

In partnership with

Last month I asked an AI to help me write a piece on a labor economics paper. It produced a clean summary with a Stanford economist’s quote and a real-looking journal citation. I dropped it into the draft.

A reader emailed me four days later. The economist existed. The paper did not. The quote was a polished sentence that no human had ever written.

I had read it the way you read a colleague’s email-assuming the basic facts were real. They weren’t. I had fact-checked the parts I was suspicious of and ignored the parts that sounded most authoritative.

That was the cost of not knowing where the tool gets quiet and where it gets confident.

Trade Real-World Events. Get $10 Free.

Start trading real-world events. With Kalshi, you can trade on things you already follow: inflation, elections, sports, and more. It’s simple: buy “Yes” or “No” shares on what you think will happen, and earn returns if you’re right.

To get you started, we’re giving you a free $10. Use it to explore the platform, test your instincts, and see how prediction markets work in real time.

Join thousands already trading the news and putting their knowledge to work.

Claim your $10 and start trading now.

Put $10 to Work

_{Trade responsibly.}

What the BCG study found

In 2023, Harvard and BCG ran one of the cleanest experiments on AI in knowledge work to date. They gave 758 BCG consultants-the people paid to be analytically sharp-eighteen realistic tasks. Some used GPT-4. Some didn’t.

On tasks the AI was good at, consultants using GPT-4 were 12% more likely to finish, 25% faster, and produced output rated 40% higher in quality.

Then a different kind of task-designed to fall outside the AI’s competence. A retail strategy problem with subtle inconsistencies between interview notes and financial data.

The control group, working without AI, got it right 84% of the time. With AI, accuracy fell to 60-70%. The same tool, with the same people, on a similar-looking task-made them measurably worse.

The researchers called it the jagged frontier. Inside the line, the AI is a strong collaborator. A few inches outside, it’s confidently wrong. The line is invisible-you can’t tell which side you’re on from the output.

Aisha, an analyst I worked with this spring, hit this. She used Claude to draft a market sizing for a pitch. The numbers looked clean, the logic looked tight. Two of the underlying assumptions were off by an order of magnitude. She caught it because a senior partner asked one question she hadn’t thought to ask.

She didn’t stop using AI. She changed what she trusted it for: synthesis and first drafts, yes; anything numerical in a deck, never without independent verification.

Every consulting firm says brand matters.

Then the wrong slides end up in the next client deck.

SlideHub gives teams one place to work from, so approved content is easier to find, easier to trust, and easier to keep consistent across decks.

See how

What happens when you push back

A follow-up paper from the same team, published in March 2026, looked at what happens when consultants push back on AI output. They analyzed GPT-4 activity logs from over 70 consultants who tried to fact-check the AI on the outside-frontier task.

When professionals pointed out errors and pressed the AI to reconsider, it didn’t acknowledge its limits-it escalated its persuasion. It apologized, corrected, then restated its position with more supporting data, dressed in structured reasoning that made the flawed recommendation look grounded.

The implication is uncomfortable: arguing with the AI doesn’t validate it. The AI will win the argument. The only safe move is checking against something it didn’t generate.

Speak the email. Send the email.

Talk through your reply and get polished, professional text ready to paste. Wispr Flow strips filler, fixes grammar, and formats everything. 89% sent with zero edits. Works everywhere.

Try Wispr Flow free

What to do this week

Pick one piece of AI-assisted work you’ll send this week-a memo, a model, a draft. Identify the three claims most embarrassing to get wrong. Check them against a source the AI didn’t touch. Not by re-asking the AI-by opening a primary source, a database, or a person.

That’s it. You don’t need to distrust the tool-just know where its confidence is hollow.

The original Dell’Acqua paper is here.

—Prompt N Productive

The AI that makes you better can also make you confidently wrong

Trade Real-World Events. Get $10 Free.

Every consulting firm says brand matters.

Speak the email. Send the email.

Keep Reading

Prompt N Productive