Pro@programming.dev to Technology@lemmy.worldEnglish · 19 days agoClockBench: Even the best AI models can't reliably read the clockclockbench.aiexternal-linkmessage-square7fedilinkarrow-up177arrow-down10file-text
arrow-up177arrow-down1external-linkClockBench: Even the best AI models can't reliably read the clockclockbench.aiPro@programming.dev to Technology@lemmy.worldEnglish · 19 days agomessage-square7fedilinkfile-text
minus-squareearthworm@sh.itjust.workslinkfedilinkEnglisharrow-up5·edit-219 days agoThis seems like a dumb benchmark. ClockBench evaluates whether models can read analog clocks - a task that is trivial for humans, but current frontier models struggle with. What do you mean trivial? Most humans I know can’t read the most basic white-background-big-black-numbers clocks. Someone rigged the jury to get 90% on this:
This seems like a dumb benchmark.
What do you mean trivial? Most humans I know can’t read the most basic white-background-big-black-numbers clocks.
Someone rigged the jury to get 90% on this: