On 18 April 2024, Meta released the next big thing in “open” AI models: Llama 3. It’s the latest AI model to be offered by Meta free of charge and with a relatively open (though not open-source) license that lets developers deploy it in most commercial apps and services.
Announced less than a year after Llama 2, the new model in many ways repeats its predecessor’s playbook. Llama 3’s release provides models with up to 70 billion parameters, the same as its predecessor. It was also released under a similar license which, although not fully open source, allows commercial use in most circumstances.
Look closer, however, and Llama 3’s advancements come into focus. Meta’s new model scores significantly better than its predecessor in benchmarks without an increase in model size. The secret is training data—and lots of it.
“What I found most appealing was that at 15 trillion tokens [of training data], there was no stopping. The model was not getting worse,” said Rishi Yadav, founder and CEO of Roost.ai. “Not stopping at 15 trillion is profound. It means the sky’s the limit, at least as of today.”
For AI, more data is king
Meta has yet to release a paper on the details of Llama 3 (it’s promised to do so “in the coming months”), but its announcement revealed it was trained on 15 trillion tokens of data from publicly available sources. That’s over seven times as much data as Llama 2, which was trained on 2 trillion tokens. It may even rival GPT-4: OpenAI hasn’t revealed the number of tokens used to train GPT-4, but estimates put the number at around 13 trillion.
Llama 3’s vast training data translates to improved performance. The pretrained 70-billion-parameter model’s score in the Massive Multitask Language Understanding (MMLU) benchmark leapt from 68.9 with Llama 2 to 79.5 with Llama 3. The smallest model showed even greater improvement, rising from 45.3 with Llama 2 7B to 66.6 with Llama 3 8B. The MMLU benchmark, first put forward by a 2020 preprint paper, measures a model’s ability to answer questions across a range of academic fields.
Llama 3 also defeats competing small and midsize models, like Google Gemini and Mistral 7B, across a variety of benchmarks, including MMLU. These results prove that, at least for the moment, there’s no limit to the volume of training data that can prove useful.
Llama 3 400B is still in training, but it already posts impressive benchmark results.Meta
Meta also announced plans to push open models forward in another key metric: model size. A version of Llama 3 with 400 billion parameters is slated for release later this year. Meta stated the model is still in training but, as of 15 April, it claimed an MMLU benchmark score of 86.1. That’s only a hair behind GPT-4, which scored 86.4.
That achievement, if borne out in the final release, would easily leapfrog other large open models, like Falcon 180B and Grok-1. Llama 3 400B could become the first open LLM to match the quality of…
Read full article: Llama 3 Establishes Meta as the Leader in “Open” AI
The post “Llama 3 Establishes Meta as the Leader in “Open” AI” by Matthew S. Smith was published on 04/25/2024 by spectrum.ieee.org