Small Language Models: Apple, Microsoft Debut LLM Alternative

Tech companies have been caught up in a race to build the biggest large language models (LLMs). In April, for example, Meta announced the 400-billion-parameter Llama 3, which contains twice the number of parameters—or variables that determine how the model responds to queries—than OpenAI’s original ChatGPT model from 2022. Although not confirmed, GPT-4 is estimated to have about 1.8 trillion parameters.

In the last few months, however, some of the largest tech companies, including Apple and Microsoft, have introduced small language models (SLMs). These models are a fraction of the size of their LLM counterparts and yet, on many benchmarks, can match or even outperform them in text generation.

On 10 June, at Apple’s Worldwide Developers Conference, the company announced its “Apple Intelligence” models, which have around 3 billion parameters. And in late April, Microsoft released its Phi-3 family of SLMs, featuring models housing between 3.8 billion and 14 billion parameters.

OpenAI’s CEO Sam Altman believes we’re at the end of the era of giant models.

In a series of tests, the smallest of Microsoft’s models, Phi-3-mini, rivalled OpenAI’s GPT-3.5 (175 billion parameters), which powers the free version of ChatGPT, and outperformed Google’s Gemma (7 billion parameters). The tests evaluated how well a model understands language by prompting it with questions about mathematics, philosophy, law, and more. What’s more interesting, Microsoft’s Phi-3-small, with 7 billion parameters, fared remarkably better than GPT-3.5 in many of these benchmarks.

Aaron Mueller, who researches language models at Northeastern University in Boston, isn’t surprised SLMs can go toe-to-toe with LLMs in select functions. He says that’s because scaling the number of parameters isn’t the only way to improve a model’s performance: Training it on higher-quality data can yield similar results too.

Microsoft’s Phi models were trained on fine-tuned “textbook-quality” data, says Mueller, which have a more consistent style that’s easier to learn from than the highly diverse text from across the Internet that LLMs typically rely on. Similarly, Apple trained its SLMs exclusively on richer and more complex datasets.

The rise of SLMs comes at a time when the performance gap between LLMs is quickly narrowing and tech companies look to deviate from standard scaling laws and explore other avenues for performance upgrades. At an event in April, OpenAI’s CEO Sam Altman said he believes we’re at the end of the era of giant models. “We’ll make them better in other ways.”

Because SLMs don’t consume nearly as much energy as LLMs, they can also run locally on devices like smartphones and laptops (instead of in the cloud) to preserve data privacy and personalize them to each person. In March, Google rolled out Gemini Nano to the company’s Pixel line of smartphones. The SLM can summarize audio recordings and produce smart replies to conversations…

Read full article: Small Language Models: Apple, Microsoft Debut LLM Alternative

The post “Small Language Models: Apple, Microsoft Debut LLM Alternative” by Shubham Agarwal was published on 06/20/2024 by spectrum.ieee.org