Large Language Model’s Context Windows Get Huge

Strike up a conversation with a chatbot and you may run into a frustrating limitation: It can forget what you’re discussing. This happens as earlier parts of the conversation fall out of the large language model’s context window, which is the largest chunk of text it can consider when generating a response.

Magic, an AI software development company, recently claimed an advance that can overcome this problem: a large language model (LLM) with a context window of 100 million tokens. (Tokens are the basic units of text that LLMs process, typically representing words or parts of words.) A context window that long can fit about 750 novels, which is far more than enough to consider an entire chat. It can even allow the user to input tens or hundreds of documents for the LLM to reference.

“The attention span is limited for large language models,” says Naresh Dulam, Vice President of Software Engineering at JPMorgan Chase. “But this attention span keeps on increasing. That’s what the long context window provides. With the attention span increasing, you can put more data in.”

Dramatic Growth of Context Windows

Magic’s claim that its latest LLM can use up to 100 million tokens of context easily tops the previous high water mark: Google Gemini 1.5 Pro’s context window of up to 2 million tokens. Other popular LLMs, such as the most recent versions of Anthropic’s Claude, OpenAI’s GPT, and Meta’s Llama, have context windows of 200,000 tokens or less.

To evaluate its model, Magic invented a new tool called HashHop, which is available on Github. In Magic’s blog post, the authors note that typical evaluations test a model’s memory by inserting an odd phrase in a long text document, such as putting a sentence about a coffee date in the text of Moby Dick. However, models can learn to identify the odd sentence, which makes it successful in the evaluation but unsuccessful in attempts to find other information in long documents. HashHop instead tests a model’s retrieval by providing a long document full of hashes—random strings of letters and numbers—and asking it to find specific ones. In HashHop, Magic’s model recalled hashes with up to 95 percent accuracy in a context window of 100 million tokens. Put more simply: It would be able to recall a single sentence from a corpus of up to 750 novels.

“In practice, context just works better” than alternative methods for improving model performance, said Magic CEO Eric Steinberger on the No Priors podcast. Instead of training the model on additional specialized data sets (as in the practice called fine tuning) or using a retriever algorithm to find data in an external set of documents (as in retrieval-augmented generation), Magic created this long context window that allows users to throw all their data into their prompt. “Our model sees all the data all the time,” Steinberger said.

But that’s not to say long context windows solve every problem, and Magic’s claims require some…

Read full article: Large Language Model’s Context Windows Get Huge

The post “Large Language Model’s Context Windows Get Huge” by Matthew S. Smith was published on 09/16/2024 by spectrum.ieee.org