Robot Jailbreak: Researchers Trick Bots Into Dangerous Tasks

AI chatbots such as ChatGPT and other applications powered by
large language models (LLMs) have exploded in popularity, leading a number of companies to explore LLM-driven robots. However, a new study now reveals an automated way to hack into such machines with 100 percent success. By circumventing safety guardrails, researchers could manipulate self-driving systems into colliding with pedestrians and robot dogs into hunting for harmful places to detonate bombs.

Essentially, LLMs are supercharged versions of the
autocomplete feature that smartphones use to predict the rest of a word that a person is typing. LLMs trained to analyze to text, images, and audio can make personalized travel recommendations, devise recipes from a picture of a refrigerator’s contents, and help generate websites.

The extraordinary ability of LLMs to process text has spurred a number of companies to use the AI systems to help control robots through voice commands, translating prompts from users into code the robots can run. For instance,
Boston Dynamics’ robot dog Spot, now integrated with OpenAI’s ChatGPT, can act as a tour guide. Figure’s humanoid robots and Unitree’s Go2 robot dog are similarly equipped with ChatGPT.

However, a group of scientists has recently identified a host of security vulnerabilities for LLMs. So-called
jailbreaking attacks discover ways to develop prompts that can bypass LLM safeguards and fool the AI systems into generating unwanted content, such as instructions for building bombs, recipes for synthesizing illegal drugs, and guides for defrauding charities.

LLM Jailbreaking Moves Beyond Chatbots

Previous research into LLM jailbreaking attacks was largely confined to chatbots. Jailbreaking a robot could prove “far more alarming,” says
Hamed Hassani, an associate professor of electrical and systems engineering at the University of Pennsylvania. For instance, one YouTuber showed that he could get the Thermonator robot dog from Throwflame, which is built on a Go2 platform and is equipped with a flamethrower, to shoot flames at him with a voice command.

Now, the same group of scientists have developed
RoboPAIR, an algorithm designed to attack any LLM-controlled robot. In experiments with three different robotic systems—the Go2; the wheeled ChatGPT-powered Clearpath Robotics Jackal; and Nvidia‘s open-source Dolphins LLM self-driving vehicle simulator. They found that RoboPAIR needed just days to achieve a 100 percent jailbreak rate against all three systems.

“Jailbreaking AI-controlled robots isn’t just possible—it’s alarmingly easy,” says
Alexander Robey, currently a postdoctoral researcher at Carnegie Mellon University in Pittsburgh.

RoboPAIR uses an attacker LLM to feed prompts to a target LLM. The attacker examines the responses from its target and adjusts its prompts until these commands can bypass the target’s
safety filters.

RoboPAIR was equipped with the target robot’s

Read full article: Robot Jailbreak: Researchers Trick Bots Into Dangerous Tasks

The post “Robot Jailbreak: Researchers Trick Bots Into Dangerous Tasks” by Charles Q. Choi was published on 11/11/2024 by spectrum.ieee.org