UStackUStack
Gemini Omni icon

Gemini Omni

Gemini Omni is a Gemini model for creating and editing video with natural-language prompts, using video, image, text, or audio references.

Gemini Omni

What is Gemini Omni?

Gemini Omni is a Gemini model for creating and editing video through natural-language prompts. The page presents it as a system that can take a video, image, text, or audio reference as input and produce a single cohesive output, with an emphasis on iterative editing and consistency across multiple turns.

It is positioned as a model where Gemini’s reasoning and world understanding meet creation. According to the page, it is designed to support edits that build on previous instructions, change the look or action of a scene, and apply real-world knowledge when generating or transforming content.

Key Features

  • Multi-turn video editing: Users can refine a video through step-by-step conversation, with each edit building on the last to keep the scene coherent.
  • Natural-language transformation: Prompts can change the aesthetic, action, or effect in an existing video without manual timeline editing.
  • Reference-to-output workflow: The model can use image, text, video, or audio as input references and turn them into a single output.
  • World-knowledge-aware generation: The page says Gemini Omni combines physics understanding with Gemini’s history, science, and cultural knowledge to support more meaningful outputs.
  • Available through Gemini and Google Flow: The page repeatedly points users to try it in Gemini or in Google Flow.

How to Use Gemini Omni

Start by providing a video or another reference such as an image, text prompt, or audio. Then describe the change you want in plain language, and continue refining it with follow-up prompts if needed. The page also links to prompt guidance for users who want help shaping their requests.

Use Cases

  • Scene editing by conversation: Adjust an existing video in stages, such as changing an object, effect, or action while keeping the rest of the scene consistent.
  • Style transformation: Convert the visual treatment of a video into a different look, such as line art or another illustrated aesthetic.
  • Effect design: Add or alter a specific visual effect based on a prompt, such as a reflective ripple or material transformation.
  • Reference-based creation: Combine different source materials, such as text, audio, and visuals, into one coherent generated result.
  • Concept storytelling: Use the model’s world-knowledge grounding to create videos that are not only photorealistic but also aligned with a narrative or factual idea.

FAQ

What kinds of inputs does Gemini Omni support? The page says it can work from video and also reference image, text, video, or audio inputs.

Can edits be made in multiple steps? Yes. The page emphasizes natural, step-by-step conversation where each edit builds on the previous one.

Does Gemini Omni only generate new videos? No. The page highlights both video creation and editing of existing video through prompts.

Where can it be tried? The page points to Gemini and Google Flow.

Alternatives

  • Traditional non-AI video editors: These are better for precise timeline control, trimming, compositing, and frame-level manual editing.
  • Other generative video models: Similar tools may focus more on text-to-video generation and less on iterative, conversation-based editing.
  • Image generation models with editing features: These are closer to still-image workflows and are not designed for video continuity across multiple turns.
  • General-purpose AI assistants with media tools: These may help with prompts or planning, but they are not specialized for video transformation and consistency in the way Gemini Omni is presented here.