Imagine playing a new, slightly altered version of the game GeoGuessr. You’re faced with a photo of an average U.S. house, maybe two floors with a front lawn in a cul-de-sac and an American flag flying proudly out front. But there’s nothing particularly distinctive about this home, nothing to tell you the state it’s in or where the owners are from.
You have two tools at your disposal: your brain, and 44,416 low-resolution, bird’s-eye-view photos of random places across the United States and their associated location data. Could you match the house to an aerial image and locate it correctly?
I definitely couldn’t, but a new machine learning model likely could. The software, created by researchers at China University of Petroleum (East China), searches a database of remote sensing photos with associated location information to match the streetside image—of a home or a commercial building or anything else that can be photographed from a road—to an aerial image in the database. While other systems can do the same, this one is pocket-size compared to others and super accurate.
At its best (when faced with a picture that has a 180 degree field of view), it succeeds up to 97 percent of the time in the first stage of narrowing down location. That’s better than or within two percentage points of all the other models available for comparison. Even under less-than-ideal conditions, it performs better than many competitors. When pinpointing an exact location, it’s correct 82 percent of the time, which is within three points of the other models.
But this model is novel for its speed and memory savings. It is at least twice as fast as similar ones and uses less than a third the memory they require, according to the researchers. The combination makes it valuable for applications in navigation systems and the defense industry.
“We train the AI to ignore the superficial differences in perspective and focus on extracting the same ‘key landmarks’ from both views, converting them into a simple, shared language,” explains Peng Ren, who develops machine learning and signal processing algorithms at China University of Petroleum (East China).
The software relies on a method called deep cross-view hashing. Rather than try to compare each pixel of a street view picture to every single image in the giant bird’s-eye-view database, this method relies on hashing, which means transforming a collection of data—in this case, street-level and aerial photos—into a string of numbers unique to the data.
To do that, the China University of Petroleum research group employs a type of deep learning model called a vision transformer that splits images into small units and finds patterns among the pieces. The model may find in a photo what it’s been trained to identify as a tall building or circular fountain or roundabout, and then encode its findings into number strings. ChatGPT is based on similar architecture, but finds patterns in text instead of…
Read full article: Where Was This Photo Taken? AI Knows Instantly

The post “Where Was This Photo Taken? AI Knows Instantly” by Perri Thaler was published on 10/15/2025 by spectrum.ieee.org
Leave a Reply