Wan v2.6 Image-to-Video

Wan v2.6 Image-to-Video is Alibaba's image-to-video model that animates still images into high-fidelity video clips up to 1080p and 15 seconds, with optional audio and precise motion control from text guidance.

image-to-video

index.ts

import { experimental_generateVideo as generateVideo } from 'ai';

const result = await generateVideo({
  model: 'alibaba/wan-v2.6-i2v',
  prompt: 'A serene mountain lake at sunrise.'
});

Overview About Providers Similar FAQ

About Wan v2.6 Image-to-Video

Wan v2.6 Image-to-Video belongs to Alibaba's 2.6-generation video lineup, the quality-focused image-animation model in the series. The model accepts a source image, between 360px and 2000px on either dimension, up to 100MB, paired with a descriptive text prompt that guides the motion, timing, and scene direction. From those inputs it produces video at 480p, 720p, or 1080p in five aspect ratios.

The 2.6 generation introduced substantial upgrades over its predecessors, including improved temporal consistency (less flicker between frames), sharper fine detail retention from the source image, and better instruction-following when the text prompt specifies particular motions or environmental conditions. Audio integration is optionally available, allowing ambient sound or scene audio to accompany the animated output.

Unlike the R2V models, I2V doesn't attempt to extract a character identity for reuse across multiple shots; it animates the submitted image as a single continuous visual source. Multi-shot mode is disabled by default, which keeps the output as a single unbroken clip, appropriate for product showcases, still-life animations, and short narrative scenes anchored to one visual composition.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Wan v2.6 Image-to-Video

About Wan v2.6 Image-to-Video