UStackUStack
Wan icon

Wan

Wan is an AI creative platform for text-to-image, image-to-image, text-to-video, image-to-video, and image editing to lower the barrier to creation.

Wan

What is Wan AI?

Wan (wan.video) is an AI creative platform designed to help lower the barrier to creative work using artificial intelligence. It provides multiple generation and editing workflows—covering text-to-image, image-to-image, text-to-video, image-to-video, and image editing—so creators can move between ideas, references, and final visuals.

The core purpose of Wan is to support end-to-end creative iteration: starting from text prompts or existing images, generating new visuals or video clips, and refining results with image editing capabilities.

Key Features

  • Text-to-image generation: Create images from written prompts to turn ideas into visual starting points.
  • Image-to-image generation: Generate new images based on an existing image input, useful for variations or style/reference changes.
  • Text-to-video generation: Produce video content from text prompts when you want a narrative or motion concept expressed in words.
  • Image-to-video generation: Generate video from an input image to extend a still reference into a moving output.
  • Image editing: Edit existing images within the same creative workflow, supporting refinement after generation.

How to Use Wan AI

  1. Choose the workflow that matches your input: text-to-image, image-to-image, text-to-video, image-to-video, or image editing.
  2. Provide the required input (a prompt and/or an image), then run the generation or editing step.
  3. Iterate by adjusting prompts or using different image inputs to refine the output toward your intended result.

Use Cases

  • Concepting for visuals: Start with a text prompt to generate a set of image options before committing to a final direction.
  • Style and reference adaptation: Use image-to-image generation to keep a recognizable composition while changing style or attributes.
  • From script to motion: Draft a description of a scene in text and use text-to-video generation to obtain video results for review.
  • Animating a reference: Convert a specific still image into a video using image-to-video generation when you want motion based on a known composition.
  • Post-generation refinement: Apply image editing after generating an image to correct or adjust details while keeping the overall look.

FAQ

What types of media can Wan generate or edit? Wan supports image generation (text-to-image and image-to-image), video generation (text-to-video and image-to-video), and image editing.

Do I need an existing image to use Wan? Not always. You can use text-to-image or text-to-video when you want to start from prompts, or use image-to-image / image-to-video when you have a reference image.

How do I choose between image-to-image and image editing? The difference is that image-to-image is used to generate a new image from an input image, while image editing focuses on modifying an existing image within the editing workflow.

Can Wan help me move from a still image to video? Yes. The platform includes image-to-video generation, which is designed for creating video from an input image.

Alternatives

  • Other AI image/video generation platforms: These similarly offer prompt-based image or video generation, typically emphasizing one workflow (images or videos) more than cross-workflow iteration.
  • Creative tools with generative features: Applications that combine image/video creation with editing can be a fit if you want a single workspace for both generation and refinement, though capabilities may differ by media type.
  • Specialized video generation tools: Tools focused primarily on text-to-video or image-to-video workflows may streamline video output when your priority is motion generation over image editing.
  • Image editing software with AI-assisted effects: For teams that already have strong pipelines in traditional editing, AI-assisted editing tools can supplement manual workflows, though they may not provide the same generation breadth across image and video modes.