Humanoid Robot Olympics: Tackling Everyday Chores

I was a little disappointed by China’s World Humanoid Robot Games.¹ As fun as real-life Rock ‘Em Sock ‘Em Robots is, what people really care about is robots doing their chores. This is why robot laundry folding videos are so popular: we didn’t know how to do that even a few years ago. And it is certainly something that people want! But as this article so nicely articulates, basic laundry folding is in a sweet spot given the techniques we have now. It might feel like if our AI techniques can fold laundry, maybe they can do anything—but that isn’t true, and we’re going to have to invent new techniques to be really general purpose and useful.

Contents

Current State of The Art

Event 1: Doors

Event 2: Laundry

Event 3: Tools

Event 4: Fingertip Manipulation

Event 5: Wet Manipulation

Event 1: Doors

With that in mind I am issuing a challenge to roboticists: Here are my Humanoid Olympic events. Each event will require us to push the state of the art and unlock new capabilities in robotic manipulation. I will update my Substack as folks achieve these milestones, and will mail actual real-life medals to the winners.

Current State of The Art

In order to talk about why each of these challenges pushes the state of the art in robotic manipulation, let’s first talk about what’s working now. What I’m seeing working is learning from demonstration. Generally, folks are using puppeteering interfaces. Most common seems to be two copies of the robot so that a human can grab and move one of them while the other follows, or a virtual reality headset with controllers for hand tracking. They then record some 10-30 second activity hundreds of times over. Fromm that data, a neural network is trained to mimic those examples. This technique has unlocked tasks that have steps that are somewhat chaotic (like pulling a corner of a towel to get it to lay flat) or have a high state space (like how a towel can be bunched up in myriad different ways).

Thinking about this method of training robots to do things, it should be clear what some of the limitations are. Each of these has exceptions, but together they form a general trend.

No force feedback at the wrists.² The robot can only ever perform as well as the human teleoperating it, but we don’t yet have good standardized ways of getting high resolution force information to the human teleoperator.
Limited finger control.³ It’s hard for the teleoperator (and for a foundation model) to see and control all of a robot’s fingers with much more finesse than just opening and closing them.
No sense of touch.⁴ Human hands are packed absolutely full of sensors. Getting anywhere near that kind of sensing out of robot hands in a way that’s usable by a human teleoperator is not currently possible.
Medium precision.⁵ Based on videos I’ve seen, I think we’ve got about 1-3 cm precision for tasks.