On Wednesday, Google launched a new multimodal general AI (artificial intelligence) model, Gemini, to compete with products from OpenAI, Microsoft, and Meta.
Google Launches Its Most Powerful AI Model, Gemini
According to the search giant, Gemini is the ‘largest and most capable’ large language model (LLM) that the company has ever built, with state-of-the-art performance across many leading benchmarks.
Developed by the Google DeepMind AI unit, this flexible AI model is trained on Google’s Tensor Processing Units (TPU), which makes it run significantly faster than earlier, smaller and less-capable models. It can generalize and seamlessly understand, operate across, and combine different types of information, including text, code, audio, image, and video.
Google brings Gemini 1.0, its first version, in three different sizes: Gemini Ultra, its largest and most capable model for highly complex tasks; Gemini Pro, its best model for scaling across a wide range of tasks; and Gemini Nano, its most efficient model for on-device tasks.
“These are the first models of the Gemini era and the first realisation of the vision we had when we formed Google DeepMind earlier this year. This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company,” Sundar Pichai, CEO of Google and Alphabet, said in a note to the blog post about the announcement.
According to Google DeepMind, Gemini Ultra outperforms GPT-4 on 30 of the 32 widely used academic benchmark tests measuring capabilities such as image understanding or mathematical reasoning.
In particular, Google says Gemini Ultra’s score of 90 percent on the MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine, and ethics for testing both world knowledge and problem-solving abilities, makes it the first AI model to surpass human experts on that benchmark.
Moreover, Google said Gemini Ultra scored 59.4% on the new MMMU benchmark, consisting of multimodal tasks spanning different domains requiring deliberate reasoning. It even outperformed previous models in image benchmarks without assistance from optical character recognition (OCR) systems that extract text from images for further processing.
Availability of Gemini AI
Google says that the Pro version is now available in Bard chatbot and will be available in English in more than 170 countries and territories, with plans to expand to different modalities and support new languages and locations soon. Starting December 13, developers and enterprise customers will be able to access Gemini Pro via the Gemini API in Google AI Studio or Google Cloud Vertex AI.
Further, Google is also bringing Gemini Nano on Pixel 8 Pro smartphones and has plans to integrate Gemini over time into its Search, Ads, Chrome, and other services in the coming months. Additionally, Android developers will have access to Gemini Nano via AICore, a new system capability available in Android 14, starting on Pixel 8 Pro devices, which will be available on an early preview basis.
Lastly, Google plans to release its most advanced version of the AI model, Gemini Ultra, through Bard Advanced starting in early 2024. It will be available to select customers, developers, partners, and safety and responsibility experts “for early experimentation and feedback” before it is rolled out to developers and enterprise customers early next year.