Millions of people log on every day for the latest edition of Connections, a popular category-matching game from The New York Times. Launched in mid-2023, the game garnered 2.3 billion plays in the first six months. The concept is straightforward yet captivating: Players get four tries to identify four themes among 16 words.
Part of the fun for players is applying abstract reasoning and semantic knowledge to spot connecting meanings. Under the hood, however, puzzle creation is complex. New York University researchers recently tested the ability of OpenAI’s GPT-4 large language model (LLM) to create engaging and creative puzzles. Their study, published as a preprint on arXiv in July, found LLMs lack the metacognition necessary to assume the player’s perspective and anticipate their downstream reasoning—but with careful prompting and domain-specific subtasks, LLMs can still write puzzles on par with The New York Times.
Each Connections puzzle features 16 words (left) that must be sorted into 4 categories of 4 words each (right).The New York Times
“Models like GPT don’t know how humans think, so they’re bad at estimating how tricky a puzzle is for the human brain,” says lead author Timothy Merino, a Ph.D. student in NYU’s Game Innovation Lab. “On the flip side, LLMs have a very impressive linguistic understanding and knowledge base from the massive amounts of text they train on.”
The researchers first needed to understand the core game mechanics and why they’re engaging. Certain word groups, like opera titles or basketball teams, might be familiar to some players. However, the challenge isn’t just a knowledge check. “[The challenge] comes from spotting groups with the presence of misleading words that make their categorization ambiguous,” says Merino.
Intentionally distracting words serve as red herrings and form the game’s signature trickiness. In developing GPT-4’s generative pipeline, the researchers tested whether intentional overlap and false groups resulted in tough yet enjoyable puzzles.
A successful Connections puzzle includes intentionally overlapping words (top). The NYU researchers included a process for generating new word groups in their LLM approach to making Connections puzzles (bottom).NYU
This mirrors the thinking of Connections creator and editor Wyna Liu, whose editorial approach considers “decoys” that don’t belong to any other category. Senior puzzle editor Joel Fagliano, who tests and edits Liu’s boards, has said that spotting a red herring is among the hardest skills to learn. As he puts it, “More overlap makes a harder puzzle.” (The New York Times declined IEEE Spectrum’s request for an interview with Liu.)
The NYU paper cites Liu’s three axes of difficulty: word familiarity, category ambiguity, and wordplay variety. Meeting these constraints is a unique challenge for modern LLM systems.
AI Needs Good Prompts for Good Puzzles
The team began by explaining the game rules to the AI…
Read full article: Artificial Intelligence Makes Compelling Connections Puzzles
The post “Artificial Intelligence Makes Compelling Connections Puzzles” by Shannon Cuthrell was published on 09/05/2024 by spectrum.ieee.org
Leave a Reply