Skip to content
Dashboard

Wan v2.6 Image-to-Video

Wan v2.6 Image-to-Video is Alibaba's image-to-video model that animates still images into high-fidelity video clips up to 1080p and 15 seconds, with optional audio and precise motion control from text guidance.

image-to-video
index.ts
import { experimental_generateVideo as generateVideo } from 'ai';
const result = await generateVideo({
model: 'alibaba/wan-v2.6-i2v',
prompt: 'A serene mountain lake at sunrise.'
});

Playground

Try out Wan v2.6 Image-to-Video by Alibaba. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Alibaba
——
12/16/2025

More models by Alibaba

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
0.5s
366tps
$0.32/M$1.28/M
Read:$0.08/M
Write:$0.5/M
——
+3
alibaba logo
fireworks logo
togetherai logo
06/01/2026
991K
3.8s
55tps
$1.25/M$3.75/M
Read:$0.25/M
Write:$1.56/M
——
+2
alibaba logo
05/21/2026
240K
1.8s
86tps
$1.30/M
$7.80/M
Read:
$0.26/M
Write:
$1.63/M
——
+2
alibaba logo
04/20/2026
1M
1.6s
109tps
$0.50/M
$3.00/M
Read:
$0.1/M
Write:
$0.63/M
——
+3
alibaba logo
fireworks logo
togetherai logo
04/02/2026
1M
1.1s
153tps
$0.10/M$0.40/M
Read:$0.0/M
Write:$0.13/M
——
+3
alibaba logo
02/24/2026
262K
0.5s
70tps
$0.09/M$0.10/M——
deepinfra logo
novita logo
vertex logo
04/01/2025

About Wan v2.6 Image-to-Video

Wan v2.6 Image-to-Video belongs to Alibaba's 2.6-generation video lineup, the quality-focused image-animation model in the series. The model accepts a source image, between 360px and 2000px on either dimension, up to 100MB, paired with a descriptive text prompt that guides the motion, timing, and scene direction. From those inputs it produces video at 480p, 720p, or 1080p in five aspect ratios.

The 2.6 generation introduced substantial upgrades over its predecessors, including improved temporal consistency (less flicker between frames), sharper fine detail retention from the source image, and better instruction-following when the text prompt specifies particular motions or environmental conditions. Audio integration is optionally available, allowing ambient sound or scene audio to accompany the animated output.

Unlike the R2V models, I2V doesn't attempt to extract a character identity for reuse across multiple shots; it animates the submitted image as a single continuous visual source. Multi-shot mode is disabled by default, which keeps the output as a single unbroken clip, appropriate for product showcases, still-life animations, and short narrative scenes anchored to one visual composition.

What To Consider When Choosing a Provider

  • Configuration: For rapid iteration and draft-quality previews, consider wan-v2.6-i2v-flash, which trades some visual fidelity for significantly faster generation times.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Wan v2.6 Image-to-Video

Best For

  • Animated product advertising: Turning product photography into short video ads without a source video reference
  • Bringing static art to life: Animating concept art, illustrations, or architectural renders with guided motion
  • Single-hero-image social clips: Portrait or landscape outputs driven by a motion-direction prompt
  • High-fidelity 1080p animation: Workflows where visual quality outweighs generation speed

Consider Alternatives When

  • Speed over peak quality: Use wan-v2.6-i2v-flash for faster turnaround at 720p or 1080p during iteration
  • Text-only video generation: Wan-v2.6-t2v generates video from a text description without a source image
  • Cross-scene character consistency: Wan-v2.6-r2v preserves a character's identity across multiple generated scenes

Conclusion

Wan v2.6 Image-to-Video transforms static images into polished animated video clips with precise text-guided motion control, supporting the full resolution range up to 1080p and durations up to 15 seconds. It is the quality-first choice within the Wan I2V lineup for teams where output fidelity outweighs generation speed.

Frequently Asked Questions

  • What image formats and sizes are accepted?

    The model accepts images between 360px and 2000px on each dimension, with a maximum file size of 100MB.

  • Can I control the direction of motion with a text prompt?

    Yes. A text prompt accompanying the image guides the generated motion, camera direction, and scene atmosphere.

  • Does Wan v2.6 Image-to-Video support audio output?

    Audio integration is optionally available, ambient sounds can accompany the generated video when enabled.

  • How long can generated clips be?

    Clips can be 5, 10, or 15 seconds long, making this the longest-output option in the I2V variants.

  • What aspect ratios are supported?

    The model supports 16:9, 9:16, 1:1, 4:3, and 3:4, the widest aspect ratio selection in the Wan 2.6 I2V lineup.

  • When should I choose I2V Flash instead?

    If you are prototyping or need fast feedback on motion ideas, the flash variant offers faster generation. For final-quality deliverables at 1080p, the standard I2V model is recommended.