Large Language Models Struggle With Reading Clocks

Large Language Models Struggle With Reading Clocks

This article is part of our exclusive IEEE Journal Watch series in partnership with IEEE Xplore.

The rapidly advancing abilities of AI have left many people worrying. But don’t fret quite yet: If you can read an analog clock correctly, you are still outperforming AI in that regard.

AI models that are capable of analyzing different types of media in the form of text, images and video—called multimodal large language models (MLLMs)—are gaining traction in various applications, such as sports analytics and autonomous driving. But sometimes, these models can fail at what seems like the simplest of tasks, including accurately reading time from analog clock. This raises questions of which factors of image analysis, exactly, are these models struggling with.

For example, when it comes to reading traditional clocks, do the models struggle to discern between the short and long hands? Or struggle to pinpoint the exact angle and direction of hands relative to the numbers? The answers to these seemingly trivial questions can provide critical insights into the major limitations of these models.

Javier Conde, an assistant professor at the Universidad Politécnica de Madrid, and colleagues at Politécnico di Milano and Universidad de Valladolid, sought to investigate these limitations in a recent study. The results, published 16 October in IEEE Internet Computing, suggest that if a MLLM struggles with one facet of image analysis, this can cause a cascading effect that impacts other aspects of its image analysis.

How Well Can AI Tell Time?

First, the research team constructed a large dataset of synthetic images of analog clocks, which collectively displayed more than 43,000 indicated times, and tested the ability of four different MLLMs to read the times in a subset of images. All four models initially failed to tell time accurately. The researchers were able to boost performance of the models by training them with an additional 5,000 images from the dataset and testing the models again, using additional images they hadn’t seen before. However, the models’ performance dropped again when tested against a completely new collection of clock images.

The results touch on a key limitation of many AI models: They are good at recognizing data they are familiar with, but often fail to recognize new scenarios they have not yet encountered in their training data. In other words, they often lack generalization.

Conde and his colleagues wanted to dig deeper into what makes it so difficult for MLLMs to tell time. If the problem is related to the model’s sensitivity to the spatial directions of a clock’s hands, then further fine-tuning could address this limitation—simply expose the model to more data and then it will become better at the task at hand.

In a series of experiments, they created new datasets of analog clocks, either with distorted shapes or altered the appearance of the clock hands, for example by adding arrows to the ends. “While such…

Read full article: Large Language Models Struggle With Reading Clocks

The post “Large Language Models Struggle With Reading Clocks” by Michelle Hampson was published on 11/08/2025 by spectrum.ieee.org