Gemini 1.5: Google’s new AI model already has a major update

Google has released a new version of Gemini, its multimodal large language models, only a week after rolling out its Gemini 1.0 Ultra model. Gemini 1.5 is the tech giant’s next-generation model, which is said to have “dramatically enhanced performance.” According to DeepMind, Alphabet’s AI division, 1.5 Pro can process up to one million tokens of information.

Contents

What can Google’s Gemini 1.5 do?

How do I access Google Gemini 1.5?

Suswati Basu

Freelance journalist

Introducing Gemini 1.5: our next-generation model with dramatically enhanced performance. It also achieves a breakthrough in long-context understanding.

The first release is 1.5 Pro, capable of processing up to 1 million tokens of information. 🧵 https://t.co/qT0aXdFL0n pic.twitter.com/xA0ib11f00

— Google DeepMind (@GoogleDeepMind) February 15, 2024

In a blog post on Google’s website, CEO Sundar Pichai said, “Our teams continue pushing the frontiers of our latest models with safety at the core. They are making rapid progress.

“This new generation also delivers a breakthrough in long-context understanding,” he continued, stating that the model set could now run up to one million tokens consistently, “achieving the longest context window of any large-scale foundation model yet.”

WIRED reported that Demis Hassabis, CEO of Google DeepMind, drew parallels between its immense input capacity and a person’s working memory, reflecting on insights he gained as a neuroscientist years ago.

The AI chatbot and assistant is said to introduce significant performance improvements and incorporates a more efficient training and serving process, which Google calls the Mixture-of-Experts (MoE) architecture.

What can Google’s Gemini 1.5 do?

According to a thread posted on Google’s DeepMind profile on X, data scientists tested Gemini 1.5 using a series of text, code, image, audio, and video evaluations and found that 1.5 Pro outperformed 1.0 Pro on 87% of the benchmarks used for developing their LLMs.

One million tokens are reportedly equivalent to over 700,000 words, more than 30,000 lines of code, 11 hours of audio, or one hour of video. This exceeds the capabilities of other AI models, such as OpenAI’s GPT-4, which runs ChatGPT.

The team claimed that 1.5 Pro was even able to learn a new skill from information given in a prompt. They said, “When provided with a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, it could translate from English at a similar level to people learning from the same content.”

The report also mentioned that during the analysis of a 402-page PDF containing the Apollo 11 communications transcript, the model was tasked with identifying humorous segments. It highlighted several instances, including a moment when astronauts joked that a delay in communications was because of a sandwich break.

The department’s Research and Deep Learning Lead Oriol Vinyals called it a “drastic model” but cautioned that “like with any machine learning model, it sometimes doesn’t get it right.”

To show what’s possible with the drastically huge context window in Gemini 1.5 Pro, we prompted it with the three.js examples code – over 100,000 lines of code/800k+ tokens!

(That’s not even the max, it can handle millions of tokens 😀)

Gemini was able to process all the code… pic.twitter.com/N2VYgn2JFJ

— Oriol Vinyals (@OriolVinyalsML) February 15, 2024

How do I access Google Gemini 1.5?

A limited number of developers and cloud customers have been granted access to use 1.0 Ultra along with its Gemini API in AI Studio and Vertex AI, in a private preview. There is no date yet for a general release.

Gemini, previously known as Bard, was released this week for Android and iOS users in the U.S. after being rebranded. The rapid advancements in generative AI is seen to be in sharp contrast with concerns about the technology’s potential risks. Google asserts that it has subjected Gemini Pro 1.5 to “extensive evaluations” and believes that offering restricted access serves as a means to collect feedback on possible dangers.

In addition to this, the company said it has granted researchers at the UK’s AI Safety Institute access to its most advanced models for evaluation purposes.

Featured image: Canva / DALL·E

Suswati Basu

Freelance journalist

Suswati Basu is a multilingual, award-winning editor and the founder of the intersectional literature channel, How To Be Books. She was shortlisted for the Guardian Mary Stott Prize and longlisted for the Guardian International Development Journalism Award.

With 18 years of experience in the media industry, Suswati has held significant roles such as head of audience and deputy editor for NationalWorld news, digital editor for Channel 4 News
and ITV News. She has also contributed to the Guardian and received training at the BBC As an audience, trends, and SEO specialist, she has participated in panel events alongside Google.

Her career also includes a seven-year tenure at the leading AI company Dataminr, where she led the Europe desk and launched the company’s first employee resource group for disabilities. Before this, Suswati worked as a journalist in China for four years, investigating censorship and the Great Firewall, and acquired proficiency in several languages.

In recent years, Suswati has been nominated for six awards, including the Independent Podcast Awards, International Women’s Podcast Awards, and the Anthem Awards for her literary social affairs show.

Her areas of speciality span a wide range, including technology, Diversity, Equity, and Inclusion (DEI), social politics, mental health, and nonfiction books.