AI Developers Look Beyond Chain-of-Thought Prompting

Since OpenAI’s launch of ChatGPT in 2022, AI companies have been locked in a race to build increasingly gigantic models, causing companies to invest huge sums in building data centers. But towards the end of last year, there were rumblings that the benefits of model scaling were hitting a wall. The underwhelming performance of OpenAI’s largest ever model, GPT-4.5, gave further weight to the idea.

This situation is prompting a shift in focus, with researchers aiming to make machines “think” more like humans. Rather than building larger models, researchers are now giving them more time to think through problems. In 2023, a team at Google introduced the chain of thought (CoT) technique, in which large language models (LLMs) work through a problem step-by-step.

This approach underpins the impressive capabilities of a new generation of reasoning models like OpenAI’s o3, Google’s Gemini 2.5, Anthropic’s Claude 3.7, and DeepSeek’s R1. And AI papers are now awash with references to “thought,” “thinking,” and “reasoning,” as the number of cognitively inspired techniques proliferate.

“Since about the spring of last year, it has been clear to anybody who is serious about AI research that the next revolution will not be about scale,” says Igor Grossmann, a professor of psychology at the University of Waterloo, Canada. “The next revolution will be about better cognition.”

How AI Reasoning Works

At their core, LLMs use statistical probabilities to predict the next token—the technical name for the chunks of text that models work with—in a string of text. But the CoT technique showed that simply prompting the models to respond with a series of intermediate “reasoning” steps before arriving at an answer significantly boosted performance on math and logic problems.

“It was a surprise that it worked so incredibly well,” says Kanishk Gandhi, a computer science graduate student at Stanford University. Since then, researchers have devised a host of extensions of the technique, including “tree of thought,“ “diagram of thought,“ “logic of thought,“ and “iteration of thought,“ among others.

Leading model developers have also used reinforcement learning to bake the technique into their models, by getting a base model to produce CoT responses and then rewarding those that lead to the best final answers. In the process, models have developed a variety of cognitive strategies that mirror how humans solve complex problems, says Gandhi, such as breaking them down into simpler tasks and backtracking to correct mistakes in earlier reasoning steps.

But the way these models are trained can lead to problems, says Michael Saxon, a graduate student at University of California, Santa Barbara. Reinforcement learning requires a way to verify whether a response was correct to determine whether to give a reward. This means reasoning models have primarily been trained on tasks where this verification is easy, such as math,…

Read full article: AI Developers Look Beyond Chain-of-Thought Prompting

The post “AI Developers Look Beyond Chain-of-Thought Prompting” by Edd Gent was published on 05/08/2025 by spectrum.ieee.org