Microsoft Unveils New AI Model to Edit Video Games

So far, AI has only nipped at the edge of the games industry with tools for art, music, writing, coding, and other elements that make up video games. But what if an AI model could generate examples of gameplay from a single screenshot?

That’s the idea behind Microsoft’s Muse, a transformer model with 1.6 billion parameters trained on 500,000 hours of player data. The result is a model that, when prompted with a screenshot of the game, can generate multiple examples of gameplay, which can extend up to several minutes in length.

“They have trained what’s essentially a neural game engine that has unprecedented temporal coherence and fidelity,” says Julian Togelius, an associate professor of computer science at New York University and co-founder of AI game testing company Modl.ai. “That has wide implications and is something I could see being used in the future as part of game development more generally.”

How Microsoft’s Muse Works

Muse (also known as the World and Human Action Model, or WHAM) was trained on human gameplay data from the multiplayer action game Bleeding Edge. The researchers trained a series of models on that data, which varied from 15 million to 1.6 billion parameters; the largest, which performed best, is the focus of a paper published in February in Nature.

Though innovative, Muse isn’t the first AI model capable of generating gameplay. Notable predecessors include Google DeepMind’s Genie, Tencent’s GameGen-X, and GameNGen. These earlier models generate visually attractive gameplay and, in many cases, do so at higher frame rates and resolutions than Muse.

However, Microsoft’s approach to developing Muse offers several unique advantages.

Unlike prior models, Muse was trained on real-world human gameplay data that includes image data from gameplay and corresponding controller inputs. Microsoft was able to access this data through Ninja Theory, a game developer owned by Microsoft’s Xbox Game Studios. Genie and GameGen-X, by contrast, didn’t have access to controller inputs and instead trained on publicly available image data from various games.

Muse also uses an autoregressive transformer architecture, which is uncommon for a model that generates images (gameplay, like video, is a series of images in sequence). Muse generates gameplay as sequences of discrete tokens which weave together images and controller actions. While Genie uses a transformer architecture, it doesn’t model controller input. GameNGen and GameGen-X, meanwhile, use specialized diffusion models to generate gameplay, and again don’t model controller input.

“What we’ve seen so far, is we haven’t been able to get the consistency with diffusion models that we have with autoregressive models,” says Katja Hofmann, a senior principal research manager at Microsoft Research.

The researchers built a frontend called the WHAM Demonstrator to show off the model’s consistency. It can be used to prompt Muse with a screenshot, which then…

Read full article: Microsoft Unveils New AI Model to Edit Video Games

The post “Microsoft Unveils New AI Model to Edit Video Games” by Matthew S. Smith was published on 03/11/2025 by spectrum.ieee.org