AI Agents Take Control: Exploring Computer-Use Agents

Two years after the generative AI boom really began with the launch of ChatGPT, it no longer seems that exciting to have a phenomenally helpful AI assistant hanging around in your web browser or phone, just waiting for you to ask it questions. The next big push in AI is for AI agents that can take action on your behalf. But while agentic AI has already arrived for power users like coders, everyday consumers don’t yet have these kinds of AI assistants.

That will soon change. Anthropic, Google DeepMind, and OpenAI have all recently unveiled experimental models that can use computers the way people do—searching the web for information, filling out forms, and clicking buttons. With a little guidance from the human user, they can do thinks like order groceries, call an Uber, hunt for the best price for a product, or find a flight for your next vacation. And while these early models have limited abilities and aren’t yet widely available, they show the direction that AI is going.

“This is just the AI clicking around,” said OpenAI CEO Sam Altman in a demo video as he watched the OpenAI agent, called Operator, navigate to OpenTable, look up a San Francisco restaurant, and check for a table for two at 7pm.

Zachary Lipton, an associate professor of machine learning at Carnegie Mellon University, notes that AI agents are already being embedded in specialized software for different types of enterprise customers such as salespeople, doctors, and lawyers. But until now, we haven’t seen AI agents that can “do routine stuff on your laptop,” he says. “What’s intriguing here is the possibility of people starting to hand over the keys.”

AI Agents from Anthropic, Google DeepMind, and OpenAI

Anthropic was the first to unveil this new functionality, with an announcement in October that its Claude chatbot can now “use computers the way humans do.” The company stressed that it was giving the models this capability as a public beta test, and that it’s only available to developers who are building tools and products on top of Anthropic’s large language models. Claude navigates by viewing screenshots of what the user sees and counting the pixels required to move the cursor to a certain spot for a click. A spokesperson for Anthropic says that Claude can do this work on any computer and within any desktop application.

Next out of the gate was Google DeepMind with its Project Mariner, built on top of Google’s Gemini 2 language model. The company showed Mariner off in December but called it an “early research prototype” and said it’s only making the tool available to “trusted testers” for now. As another precaution, Mariner currently only operates within the Chrome browser, and only within an active tab, meaning that it won’t run in the background while you work on other tasks. While this requirement seems to somewhat defeat the purpose of having a time-saving AI helper, it’s likely just a temporary condition for this early…

Read full article: AI Agents Take Control: Exploring Computer-Use Agents

The post “AI Agents Take Control: Exploring Computer-Use Agents” by Eliza Strickland was published on 02/13/2025 by spectrum.ieee.org