AI Pioneer Fei-Fei Li Has a Vision for Computer Vision

Stanford University professor Fei-Fei Li has already earned her place in the history of AI. She played a major role in the deep learning revolution by laboring for years to create the ImageNet dataset and competition, which challenged AI systems to recognize objects and animals across 1,000 categories. In 2012, a neural network called AlexNet sent shockwaves through the AI research community when it resoundingly outperformed all other types of models and won the ImageNet contest. From there, neural networks took off, powered by the vast amounts of free training data now available on the Internet and GPUs that deliver unprecedented compute power.

In the 13 years since ImageNet, computer vision researchers mastered object recognition and moved on to image and video generation. Li cofounded Stanford’s Institute for Human-Centered AI (HAI) and continued to push the boundaries of computer vision. Just this year she launched a startup, World Labs, which generates 3D scenes that users can explore. World Labs is dedicated to giving AI “spatial intelligence,” or the ability to generate, reason within, and interact with 3D worlds. Li delivered a keynote yesterday at NeurIPS, the massive AI conference, about her vision for machine vision, and she gave IEEE Spectrum an exclusive interview before her talk.

Why did you title your talk “Ascending the Ladder of Visual Intelligence”?

Fei-Fei Li: I think it’s intuitive that intelligence has different levels of complexity and sophistication. In the talk, I want to deliver the sense that over the past decades, especially the past 10-plus years of the deep learning revolution, the things we have learned to do with visual intelligence are just breathtaking. We are becoming more and more capable with the technology. And I was also inspired by Judea Pearl’s “ladder of causality” [in his 2020 book The Book of Why].

The talk also has a subtitle, “From Seeing to Doing.” This is something that people don’t appreciate enough: that seeing is closely coupled with interaction and doing things, both for animals as well as for AI agents. And this is a departure from language. Language is fundamentally a communication tool that’s used to get ideas across. In my mind, these are very complementary, but equally profound, modalities of intelligence.

Do you mean that we instinctively respond to certain sights?

Li: I’m not just talking about instinct. If you look at the evolution of perception and the evolution of animal intelligence, it’s deeply, deeply intertwined. Every time we’re able to get more information from the environment, the evolutionary force pushes capability and intelligence forward. If you don’t sense the environment, your relationship with the world is very passive; whether you eat or become eaten is a very passive act. But as soon as you are able to take cues from the environment through perception, the evolutionary pressure really heightens, and that drives intelligence forward.

Do you…

Read full article: AI Pioneer Fei-Fei Li Has a Vision for Computer Vision

The post “AI Pioneer Fei-Fei Li Has a Vision for Computer Vision” by Eliza Strickland was published on 12/12/2024 by spectrum.ieee.org