June 14, 2024

Solid State Lighting Design

Find latest world news and headlines today based on politics, crime, entertainment, sports, lifestyle, technology and many more

Google Gemini 1.5 Pro can now hear

Google Gemini 1.5 Pro can now hear

Google's update for Gemini 1.5 Pro gives model ears. The model can now listen to uploaded audio files and produce information from things like earnings calls or audio from videos without having to refer to written text.

During the Google Next event, Google also announced that it will make the Gemini 1.5 Pro available to the public for the first time through its platform for building artificial intelligence applications, Vertex AI. The Gemini 1.5 Pro was first announced in February.

This new version of the Gemini Pro, which is supposed to be the middle-weight model of the Gemini family, actually outperforms the larger and more powerful model, the Gemini Ultra, in performance. Google claims that the Gemini 1.5 Pro can understand complex instructions and eliminates the need to fine-tune forms.

Gemini 1.5 Pro is not available to people who do not have access to Vertex AI. Right now, most people encounter Gemini language models through the Gemini chatbot. The Gemini Ultra runs the Gemini Advanced chat software, and while it's powerful and also capable of understanding long commands, it's not as fast as the Gemini 1.5 Pro.

The Gemini 1.5 Pro isn't the only big AI model from Google to get an update. Imagen 2, the text-to-image module that helps enhance Gemini's image generation capabilities, will also add in-draw and out-draw, allowing users to add or remove elements from images. Google has also made the SynthID digital watermark feature available on all images created through Imagen Forms. SynthID adds a watermark invisible to the viewer on images that identifies their source when viewed through the detector.

See also  Everything was announced at the THQ Nordic Digital Showcase

Google says it's also publicly previewing a way to base its AI responses using Google Search so they can answer with up-to-date information. This is not always a given with responses produced by large language models, sometimes even intentionally; Google intentionally blocked Gemini from answering questions related to the 2024 US elections.