Gemini 3.5 Live Translate

Gemini 3.5 Live Translate is Google’s near real-time speech translation model for developers, Google Meet, and the Google Translate app. It supports 70+ languages and is designed to produce natural-sounding translated audio during live conversations.

KI Übersetzer

Sprache zu Text

Text-zu-Sprache

Website Besuchen

What Gemini 3.5 Live Translate does

Gemini 3.5 Live Translate is Google’s audio model for live speech-to-speech translation. It detects more than 70 languages, processes speech as it is streamed, and generates translated audio that stays close to the speaker in near real time.

Google is rolling out the model across three product surfaces: developers can access it through the Gemini Live API and Google AI Studio, enterprises can use it in private preview in Google Meet, and consumers can use it in the Google Translate app on Android and iOS. The model is aimed at live interpretation scenarios where low latency, natural voice output, and multilingual conversation matter.

Key capabilities

Automatic language detection

The model automatically detects 70+ languages and can translate live speech without manual language setup, which reduces friction in multilingual conversations.

Near real-time speech-to-speech translation

Instead of waiting for a full turn to finish, the model translates continuously and stays only a few seconds behind the speaker, helping conversations sound more natural.

Natural-sounding voice output

Google says the translated voice preserves the speaker’s intonation, pacing, and pitch, which helps the output sound closer to the original delivery.

Streaming and noise robustness

The model is designed to handle streamed speech and noisy, unpredictable environments, making it suitable for live calls, meetings, lessons, and broadcasts.

Developer access through Gemini Live API

Google says the model can be used through the Gemini Live API and Google AI Studio, and that partners such as Agora, Fishjam, LiveKit, Pipecat, and Vision Agents use those APIs to build voice translation apps.

SynthID watermarking

Google says all generated audio is watermarked with SynthID so AI-generated speech remains detectable.

Where it fits

Multilingual live conversations
Use the model for live interpretation when two or more speakers need to keep talking naturally across language barriers, such as interviews, calls, or cross-border conversations.
Business meetings in Google Meet
Teams running meetings in Google Workspace can use the Meet integration for speech translation during business calls, with access described as private preview at launch.
Custom translation apps
Developers can build voice translation experiences in Google AI Studio or through the Gemini Live API, including apps that plug into real-time media infrastructure.
Mobile translation in the Translate app
People using the Google Translate app on Android or iOS can use Live translate for more seamless translation on the go, including headphones-based listening and Android earpiece mode.
Live audio sessions in public or noisy settings
The model is suitable for settings such as lessons, broadcasts, and noisy environments where streaming translation and low latency matter more than turn-based transcription.

Pros and Cons

Pros

Supports 70+ languages and automatically detects the language being spoken.
Continuously translates speech instead of waiting for full turns, which reduces awkward pauses.
Preserves voice characteristics such as intonation, pacing, and pitch in the translated output.
Available across multiple surfaces, including developer tools, Meet, and the Google Translate app.
Supports use in noisy, unpredictable environments and streamed audio workflows.

Cons

The launch post does not provide standalone pricing or a public general-availability date for every surface.
Google Meet access is described as private preview for select business Workspace customers at launch, so it is not broadly open immediately.
The source does not spell out setup steps, device requirements, or all supported workflows in detail.

FAQ

Where is Gemini 3.5 Live Translate available?

Google says 3.5 Live Translate is rolling out to developers in public preview through the Gemini Live API and Google AI Studio, to enterprises in private preview in Google Meet, and to everyone in the Google Translate app on Android and iOS. The rollout is described as starting with these products and previews.

How does it work in Google Meet?

In Google Meet, speech translation will soon use 3.5 Live Translate. Google says the update will offer 70+ languages, support over 2,000 language combinations in one meeting, and update the interface for instant access to speech translation.

How is it exposed in Google Translate?

In Google Translate, the Live translate feature uses the model globally on Android and iOS. Google says users can connect any pair of headphones for a more seamless experience, and Android users are also getting a listening mode that streams translated audio through the phone’s earpiece.

Does the source mention pricing?

The launch post does not list standalone pricing. The public signal in the source is availability by product surface and preview tier: public preview for developers, private preview for select business Google Workspace customers, and global rollout in Google Translate.

Are there any safety or output safeguards?

Google notes that all audio generated by its models is watermarked with SynthID, and it points readers to the model card for safety and responsibility details.

Quick Facts

Category: AI translation / audio model
Platform: Google AI Studio, Gemini Live API, Google Meet, Google Translate on Android and iOS
Languages: 70+ languages
Availability: Public preview for developers; private preview for select Workspace customers in Meet; global rollout in Google Translate
Source domain: blog.google
Output: Near real-time speech-to-speech translated audio

Gemini 3.5 Live Translate Alternativen

Wallie

Wallie is an open-source AI streamer that watches your screen, hears chat, and generates live commentary in a configurable persona. It runs locally on your machine with your own keys and is aimed at faceless content, autonomous streams, and real-time reactions.

Sanota

Sanota is an app that turns spoken memories, reflections, and interviews into clear written stories. It supports personal storytelling, family history, and shared memories, with guided prompts and subscription pricing.

Carbon Voice

Carbon Voice is an asynchronous voice messaging app for teams and individuals, with transcripts, AI catch-up, and cross-device access. It helps people and agents communicate without needing a live call.

Dictro

Dictro is a private AI dictation app for Mac that turns speech into polished text in any editable app. It works on Apple Silicon Macs, processes locally by default, and can also run in an optional cloud Fast Mode.

BeFreed

BeFreed is a personalized audio learning app that turns books and other knowledge sources into narrated listening experiences. It helps people learn on demand through interactive audio, voice selection, and built-in learning tools.

QuickQuill

QuickQuill is a macOS dictation and transcription app that runs locally on the device. It helps users record meetings, transcribe audio, generate summaries, and export notes without using a cloud service.