AI Struggles with Basic Task: Telling Time
Artificial Intelligence Struggles with Time-Telling Tasks
These days, artificial intelligence can generate photorealistic images, write novels, handle your homework, and even predict protein structures. But when it comes to a little thing like telling time, it often trips up. New research reveals this deficiency, as AI systems constantly fail at a basic task like interpreting images of clocks and calendars.
Scientists from Edinburgh University tested the ability of seven well-known multimodal large language models (LLMs) to answer time-related questions based on various images. Their study, currently hosted on the preprint server arXiv, highlights that these LLMs struggle with these basic tasks.
"Interpreting and reasoning about time from visual inputs is crucial for many real-world applications, such as event scheduling, autonomous systems, and more," the researchers wrote in the study. "Despite advancements in multimodal large language models (MLLMs), most work has focused on object detection, image captioning, or scene understanding, leaving temporal inference underexplored."
The team tested OpenAI's GPT-4o and GPT-o1; Google DeepMind's Gemini 2.0; Anthropic's Claude 3.5 Sonnet; Meta's Llama 3.2-11B-Vision-Instruct; Alibaba's Qwen2-VL7B-Instruct; and ModelBest's MiniCPM-V-2.6. They showed them various images of analog clocks and calendars.
For the clock images, the researchers asked the LLMs to identify the time on the clock in the given image. For the calendar images, they posed simple questions like, "What day of the week is New Year's Day?" and more challenging queries such as, "What is the 153rd day of the year?"
"Reading the time on analog clocks and understanding calendar layouts demand fine-grained visual recognition and non-trivial numerical reasoning," the researchers explained.
Overall, the AI systems failed to deliver. They read the time on analog clocks correctly less than 25% of the time. They fumbled with clocks bearing Roman numerals and stylized hands as much as they did with clocks lacking the seconds hand, suggesting that their issues might stem from detecting the hands and interpreting angles on the clock face.
Google's Gemini-2.0 performed the best in the team's clock task, while GPT-o1 achieved 80% accuracy on the calendar task—a result far better than its competitors. Yet, even the most successful MLLM on the calendar task made mistakes about 20% of the time.
"Most people can tell the time and use calendars from an early age. Our findings highlight a significant gap in the ability of AI to carry out what are quite basic skills for people," Rohit Saxena, a co-author of the study and Ph.D. student at the University of Edinburgh's School of Informatics, said in a university statement. "These shortfalls must be addressed if AI systems are to be successfully integrated into time-sensitive, real-world applications."
Despite these shortcomings, AI might still be capable of completing your homework, but it will likely miss its deadlines.
Insights:
Multimodal large language models can analyze images and generate text based on visual content, but interpreting specific time-related visual elements like clocks and calendars requires specialized training data. Fine-tuning the models with task-specific datasets can improve their performance. Potential applications include assisting visually impaired individuals.
- In the future, the development of AI might involve specialized training for interpreting time-related visual elements, such as clocks and calendars, to address the current shortcomings discovered in multimodal large language models.
- Advancements in tech, like artificial intelligence and multimodal large language models (MLLMs), have achieved impressive feats in various tasks, but their ability to interpret time correctly from visual inputs is still underexplored and offers potential for further improvement.
- The forthcoming integration of AI systems into time-sensitive, real-world applications may encounter hurdles due to their inability to accurately tell time, as was highlighted in a recent study examining the performance of various MLLMs on clock and calendar tasks.
- Protein structures can be predicted by AI, but their ability to understand and interpret simple visual elements, such as clocks and calendars, often proves challenging, demonstrating a significant gap between the AI's capabilities and the tasks that humans are commonly able to accomplish with ease.