AI Prompt Engineering Is Dead

Since ChatGPT dropped in the fall of 2022, everyone and their donkey has tried their hand at prompt engineering—finding a clever way to phrase your query to a large-language model (LLM) or AI art or video generator to get the best results or side-step protections. The internet is replete with prompt engineering guides, cheat sheets and advice threads to help you get the most out of an LLM.

In the commercial sector, companies are now wrangling LLMs to build product co-pilots, automate tedious work, create personal assistants, and more, says Austin Henley, a former Microsoft employee who conducted a series of interviews with people developing LLM-powered co-pilots. “Every business is trying to use it for virtually every use case that they can imagine,” Henley says.

“The only real trend may be no trend. What’s best for any given model, dataset, and prompting strategy is likely to be specific to the particular combination at hand.” —Rick Battle & Teja Gollapudi, VMware

To do so, they’ve enlisted the help of prompt engineers professionally.

However, new research suggests that prompt engineering is best done by the model itself, and not by a human engineer. This has cast doubt on prompt engineering’s future—and increased suspicions that a fair portion of prompt engineering jobs may be a passing fad, at least as the field is currently imagined.

Autotuned prompts are successful and strange

Rick Battle and Teja Gollapudi at California-based cloud computing company VMware were perplexed by how finicky and unpredictable LLM performance was in response to weird prompting techniques. For example, people have found that asking models to explain its reasoning step by step—a technique called chain-of-thought—improved their performance on a range of math and logic questions. Even weirder, Battle found that giving a model positive prompts, such as “this will be fun” or “you are as smart as chatGPT,” sometimes improved performance.

Battle and Gollapudi decided to systematically test how different prompt engineering strategies impact an LLM’s ability to solve grade school math questions. They tested three different open source language models with 60 different prompt combinations each. What they found was a surprising lack of consistency. Even chain-of-thought prompting sometimes helped and other times hurt performance. “The only real trend may be no trend,” they write. “What’s best for any given model, dataset, and prompting strategy is likely to be specific to the particular combination at hand.”

According to one research team, no human should manually optimize prompts ever again.

There is an alternative to the trial-and-error style prompt engineering that yielded such inconsistent results: Ask the language model to devise its own optimal prompt. Recently, new tools have been developed to automate this process. Given a few examples and a quantitative success metric, these tools will iteratively find the optimal phrase to…

Read full article: AI Prompt Engineering Is Dead