Wan v2.6 Reference-to-Video

Wan v2.6 Reference-to-Video is Alibaba's quality-first reference-to-video model, extracting identity from short clips and rendering new scenes with high-fidelity appearance, voice, and motion preservation at up to 1080p.

reference-to-video

index.ts

import { experimental_generateVideo as generateVideo } from 'ai';

const result = await generateVideo({
  model: 'alibaba/wan-v2.6-r2v',
  prompt: 'A serene mountain lake at sunrise.'
});

Overview Playground About Providers Similar FAQ

About Wan v2.6 Reference-to-Video

Alibaba positioned Wan v2.6 Reference-to-Video as China's first reference-to-video generation model. The core idea is simple: feed the model a short video clip of a person, and it extracts enough about their face, body, clothing, voice, and movement style to convincingly place them into an entirely new scene described by text.

What sets the standard R2V apart within the Wan lineup is its emphasis on reconstruction depth. The model devotes additional inference time to faithfully reproducing fine-grained identity signals, the specific way light falls across facial features, subtle mannerisms in how a subject moves, the particular resonance of a voice. For final-delivery video where a client or audience will scrutinize whether the generated character truly matches the reference, this level of fidelity matters.

The reference pipeline accepts multiple characters from reference images or videos. On AI Gateway, pass reference URLs in order and name them character1, character2, and so on in the prompt (see the reference-to-video docs). You can mix images and videos within provider limits (up to five references total). Each video reference can be 2 to 30 seconds. Output duration is 2 to 10 seconds at 720p or 1080p, with five aspect ratio options covering landscape, portrait, square, and intermediate formats.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Wan v2.6 Reference-to-Video

About Wan v2.6 Reference-to-Video