Wan v2.6 Reference-to-Video Flash
Wan v2.6 Reference-to-Video Flash is Alibaba's fast reference-to-video model that preserves subject identity from video references and generates new scenes at speed, supporting 720p and 1080p output for rapid creative iteration.
import { experimental_generateVideo as generateVideo } from 'ai';
const result = await generateVideo({ model: 'alibaba/wan-v2.6-r2v-flash', prompt: 'A serene mountain lake at sunrise.'});About Wan v2.6 Reference-to-Video Flash
Wan v2.6 Reference-to-Video Flash shares the same core capability as the standard R2V model, extracting a subject's visual and vocal identity from a reference video and placing them into text-prompted new scenes, but is built for contexts where generation speed is the binding constraint. The Flash architecture reduces wait times substantially, enabling creative teams to iterate through prompt variations and scene configurations much faster than the standard model allows.
The reference extraction pipeline works the same way: pass reference URLs alongside a descriptive prompt, and use character1, character2, and so on in the instruction to match URL order (images or videos; 2 to 30 seconds per video reference, within provider limits on total references). The model extracts appearance, movement style, and voice characteristics and applies them to the generated output. Because the Flash variant is optimized for speed rather than peak reconstruction fidelity, fine details in identity preservation may differ slightly from the standard R2V output.
R2V Flash supports 720p and 1080p resolutions and output durations of 2 to 10 seconds, making it suitable for social content previews, storyboard animatics, or any workflow where a director needs rapid visual confirmation that a scene concept works before ordering full-quality renders.