So what does o3’s high score really mean?
The o3 model’s high score comes as the tech industry and AI researchers have been reckoning with a slower pace of progress in the latest AI models for 2024, compared with the initial explosive developments of 2023.
Although it did not win the ARC Challenge, o3’s high score indicates that AI models could beat the competition benchmark in the near future. Beyond its unofficial high score, Chollet says many official low-compute submissions have already scored above 81 per cent on the private evaluation test set.
Dietterich also thinks that “this is a very impressive leap in performance”. However, he cautions that, without knowing more about how OpenAI’s o1 and o3 models work, it is impossible to evaluate just how impressive the high score is. For instance, if o3 was able to practise the ARC problems in advance, then that would make its achievement easier. “We will need to await an open-source replication to understand the full significance of this,” says Dietterich.
The ARC Challenge organisers are already looking to launch a second and more difficult set of benchmark tests sometime in 2025. They will also keep the ARC Prize 2025 challenge running until someone achieves the grand prize and open-sources their solution.