Clones, Parrots and Puzzles: Is AI’s Thinking Still an Illusion?
(Spoiler alert: this post discusses key plot points from the movie “The Prestige”)
A few weeks ago, I started reading Apple’s research paper The Illusion of Thinking. The title itself reminded me of the movie The Prestige, which is also about illusions. And then, almost immediately, I thought of its famous opening line:
Are you watching closely?
Curiosity got the better of me. I paused my reading, grabbed a coffee and rewatched The Prestige before returning to the paper. And while reading, the parallels became impossible to ignore. Both the film and the research deal with illusions - convincing performances that mask fragile realities beneath. The deeper I read, the more it felt like the rivalry between Angier and Borden (our two magicians in the movie), only now the stage was AI reasoning instead of Victorian magic.
The Pledge: The Promise of Reasoning
In The Prestige, every trick begins with the Pledge. The magician shows you something ordinary like a bird, a hat. In AI terms, this is the moment we marvel at the chain of thought capabilities of Large Reasoning Models (LRMs). Ask them to solve a small puzzle and they impress us with structured reasoning. Just like the magician’s opening move, the performance feels trustworthy.
But as Apple’s researchers found, at this low level of complexity, simpler LLMs actually perform better than their reasoning heavy counterparts. The models that “think too much” often wander off, losing sight of the obvious. Overthinking becomes a liability - like a magician messing up a simple coin trick. It’s a reminder that sometimes the simplest approach is not only sufficient but superior.
The Turn: Escalation and Exploration
The second act of a trick is the Turn, where the ordinary becomes extraordinary. In The Prestige, the bird vanishes, the hat disappears. Here, the magician asks you to believe. This is where LRMs shine best. On medium complexity puzzles, their ability to explore multiple reasoning paths pays off. They may make mistakes, but often they self-correct and deliver the right result.
Like Borden’s complex illusions, the reasoning feels real. And in that moment, it is easy to believe that these models are really thinking. But are they? Or are we, the audience, just too willing to be impressed?
The Prestige: Collapse at Complexity
The final act is the Prestige - the reveal, the impossible comeback. Angier steps through the Tesla machine and reappears across the stage. But as the film shows, the Prestige comes at a cost: drowned clones, a hidden trail of failure.
Apple’s findings mirror this. At high complexity, both LRMs and standard LLMs collapse. Accuracy falls to zero. Even worse, LRMs reduce their reasoning effort despite having plenty of capacity left. It’s like Angier refusing to step into the machine on his hundredth show, because deep down, the illusion no longer holds.
And even when Apple researchers handed them the algorithm, for puzzles like Tower of Hanoi, the models still failed. Knowledge wasn’t the issue. Execution was. The machine could copy the trick but not understand the stakes. It was illusion, not thinking.
Illusion or Thinking?
This is the heart of the matter. Fluent text, well structured reasoning steps, even confident explanations - none of these mean that real reasoning has happened. They show us the illusion of reasoning. What Apple’s paper reveals is that when problems get too complex, the illusion collapses. The performance cannot hide the absence of true understanding.
This connects directly to an earlier warning. In 2021, the now-famous “Stochastic Parrots” paper argued that large language models are not truly reasoning but parroting - generating sequences of words based on probability, without genuine understanding. That was the theory. Apple’s Illusion of Thinking makes it practical: by testing models on controlled puzzles, it shows exactly how the illusion breaks down. The parrot can recite convincing lines, but when the task demands true reasoning, the mimicry is exposed.
Much like in The Prestige, the trick only works as long as the audience believes. The transported man is not magic, the reasoning is not thought - it is performance. And just like Borden’s twin or Angier’s clones, the truth behind the curtain is less elegant, more mechanical.
Yann LeCun, one of the pioneers of modern AI, uses a simple analogy here: imagine two types of students. Some memorize answers by heart and do well on tests, but when the questions change even slightly, they get stuck. Others learn the deeper concepts and build mental models they can reuse for many kinds of questions. Current LLMs are clearly in the first group - they memorize patterns and repeat them well, but struggle when the problem goes beyond what they’ve seen. Most humans, in contrast, can switch between the two approaches depending on the situation. And just like in The Prestige, this is the difference between repeating a trick by memory and truly understanding the mechanics behind it, the difference between copying the illusion and mastering the craft.
The Illusion of the Illusion?
Of course, not everyone agrees with Apple’s conclusions. Shortly after the paper appeared, another group of researchers published a response with a telling title: “The Illusion of the Illusion of Thinking.” Their argument was that some of the failures Apple reported were not about reasoning itself, but about how the experiments were designed. For example, puzzles were sometimes set up in ways that made them unsolvable within the model’s token limits or evaluation metrics punished models for stylistic differences rather than actual correctness. Under different prompts and setups, these critics argued, models perform much better.
So who’s right? Probably neither. What this debate shows is that we still don’t have a clear way to measure “reasoning” in AI. Is it the ability to generate correct step by step solutions? Is it producing the right final answer? Or is it something deeper that we have not yet captured? Just as in The Prestige, where the audience chooses what to believe, in AI research the framing of the experiment often shapes the illusion we see.
Leading Through Illusions
As product people, our job is not only to be impressed by what AI can do but also to be responsible in how we use it. LLMs, LRMs can feel magical, but as Apple’s paper shows, they are still illusions - especially when complexity rises. The question is: how do we manage those illusions in practice?
A few hands on reflections:
- AI copilots: They work well for drafting emails, writing simple code or summarizing documents - the “medium complexity” sweet spot. But hand them critical compliance rules or multi-step financial flows and the illusion can break. We should frame copilots as assistants, not decision makers.
- Recommendation systems: It’s easy to mistake fluent suggestions for deep personalization. In reality, they’re probabilistic matches. Building trust means combining them with transparent signals, so users don’t feel tricked by a black box.
- Compliance and risk workflows: In areas like finance or healthcare, failure at high complexity isn’t just embarrassing - it’s dangerous. Here, illusions must be caught by strong guardrails: human review, audit trails, fallback paths. We need to design for graceful failure, not just shiny success.
The urge in product work is to oversell the magic. But if we deploy AI without honesty about its limits, the trust cost will be higher than the short term wow factor.
The real prestige for us isn’t pulling off one amazing trick. It’s creating products where users feel safe to believe - not because they’re fooled, but because they know we’ve built systems that won’t collapse when the curtain is pulled back.
Are We Watching Closely Enough?
At the end of The Prestige, Nolan forces us to ask: was the sacrifice worth it? The pursuit of perfect illusion consumed both Borden and Angier. In our world, the pursuit of perfect “thinking machines” risks the same: overpromising, overselling and eventually disappointing.
Our role as product people is not to sell the illusion blindly, but to understand the mechanics behind it - to know when the bird is swapped, when the twin steps in, when the clone drowns backstage.
Because in product, as in magic, the illusion isn’t the endgame. The real magic is in delivering value without losing the trust of your audience.
So, the question remains:
Are you watching closely?
