AI Overthinking: How LLMs Fall into Analysis Paralysis

Recent advances in large language models (LLMs) have drastically improved their ability to reason through answers to prompts. But it turns out that as their ability to reason improves, they increasingly fall victim to a relatable problem: analysis paralysis.

Contents

What does it mean to overthink?

Overthinking is an expensive mistake

A recent preprint paper from a large team, which includes authors from the University of California, Berkeley; ETH Zurich; Carnegie Mellon University; and the University of Illinois Urbana Champaign, found that LLMs with reasoning are prone to overthinking.

In other words, the model gets stuck in its own head.

What does it mean to overthink?

The paper on overthinking, which has not yet been peer reviewed, defines overthinking as “a phenomenon where models favor extended internal reasoning chains over environmental interaction.”

Alejandro Cuadrón, a research scholar at UC Berkeley and coauthor on the paper, drew an analogy to the very human problem of decision-making without certainty about the results.

“What happens when we really don’t have enough information?” asks Cuadrón. “If you’re asking yourself more and more questions, just talking to yourself…in the best scenario, I’ll realize I need more information. In the worst, I’ll get the wrong results.”

To test how the latest AI models handle this situation, Cuadrón and his colleagues tasked leading reasoning LLMs (also known as large reasoning models, or LRMs), such as OpenAI’s o1 and DeepSeek-R1, with solving problems in a popular software-engineering benchmark. The models had to find bugs and design solutions using the OpenHands agentic platform.

Cuadrón says the results show a link between a model’s general level of intelligence and its ability to successfully reason through problems.

The results? While the best reasoning models performed well overall, reasoning models were found to overthink nearly three times as often as nonreasoning models. And the more a model overthought, the fewer issues it resolved. On average, reasoning models were 7.9 percent less successful per unit increase in overthinking.

Reasoning models based on LLMs with relatively few parameters, such as Alibaba’s QwQ-32B (which has 32 billion parameters), were especially prone to overthinking. QwQ, DeepSeek-R1 32B, and Sky-T1-R had the highest overthinking scores, and they weren’t any more successful at resolving tasks than nonreasoning models.

Cuadrón says this shows a link between a model’s general level of intelligence and its ability to successfully reason through problems.

“I think model size is one of the key contributors, as model size leads to is ‘smartness,’ so to speak,” said Cuadrón. “To avoid overthinking, a model must interact with and understand the environment, and it must understand its output.”

Overthinking is an expensive mistake

AI overthinking is an intriguing problem from a human perspective, as it mirrors the state of mind we often struggle with. But LLMs are, of course, computer systems, which means…

Read full article: AI Overthinking: How LLMs Fall into Analysis Paralysis

The post “AI Overthinking: How LLMs Fall into Analysis Paralysis” by Matthew S. Smith was published on 03/05/2025 by spectrum.ieee.org