Artificial intelligence (AI) models are constantly contending to crush performance benchmarks that gauge their skills. But these models have been slow to surpass AI “IQ” tests that move beyond memorizing, reasoning, and recognizing patterns to measuring the more profound humanlike facets of intelligence, such as the ability to learn in real time.
One of these more difficult benchmarks is the Abstraction and Reasoning Corpus (ARC) for artificial general intelligence (AGI) introduced by AI researcher and Google senior staff software engineer François Chollet in 2019. The ARC-AGI benchmark measures the efficiency of AI systems to acquire new skills outside of their training data. Chollet considers the ability to efficiently aquire new skills a mark of AGI.Since its inception, however, the ARC-AGI’s success rate has been lagging—from an already low 21 percent in 2020 to just 30 percent in 2023.
To jumpstart renewed progress, Chollet and Mike Knoop, cofounder of workflow automation software Zapier, are hosting the ARC Prize. The contest challenges entrants to build an open-source solution that dominates the ARC-AGI benchmark, with a pool of more than US $1 million in prize money. A $500,000 grand prize will be split among the top five teams that achieve at least 85 percent performance, while $45,000 will be granted to the submitted research paper that best furthers the understanding of how to attain high performance on ARC-AGI.
“LLMs are not by themselves intelligent—they’re more like a way to store and retrieve knowledge, which is a component of intelligence, but it is not all there is.” —François Chollet, Google
“Every other AI benchmark out there assumes that the task is fixed and you can prepare for it in advance. What makes this competition special is that it is the only AI benchmark with an emphasis on the ability to understand a new task on the fly,” says Chollet.
Contenders must craft an AI model that solves a set of 100 visual puzzles. Each puzzle presents a new task, with a handful of grid pairs that demonstrate sample inputs and their corresponding outputs. The model should be able to infer the rule that produces a certain output based on the input and apply it to a final grid with only the input shown. This type of puzzle perplexes AI systems because they’re more attuned to memorizing a program template for solving a problem rather than adapting to tasks they haven’t been trained on.
The ARC Prize has a training dataset and a testing dataset, both of which are public. However, the evaluation dataset assessing model performance is private, so models can’t be trained on it. Submissions also won’t have Internet access, so solutions must be able to run offline, adding another layer of difficulty.
Some approaches Chollet has seen from contestants so far encompass two broad categories: domain-specific language (DSL) program synthesis and fine-tuning large language models (LLMs). The first approach entails developing…
Read full article: AGI: ARC Prize Offers $1 Million to Inspire AI Development
The post “AGI: ARC Prize Offers $1 Million to Inspire AI Development” by Rina Diane Caballar was published on 07/30/2024 by spectrum.ieee.org
Leave a Reply