Skip to content

Wan v2.6 Text-to-Video

Wan v2.6 Text-to-Video is the production-grade text-to-video model in Alibaba's Wan series, generating cinematic clips up to 15 seconds with automatic multi-shot scene composition and native audio at resolutions up to 1080p.

text-to-video
index.ts
import { experimental_generateVideo as generateVideo } from 'ai';
const result = await generateVideo({
model: 'alibaba/wan-v2.6-t2v',
prompt: 'A serene mountain lake at sunrise.'
});

Frequently Asked Questions

  • What makes multi-shot storytelling different from generating multiple separate clips?

    The model handles scene transitions internally, maintaining character and object consistency across cuts. This produces a single cohesive video rather than requiring you to generate and stitch together separate clips with potential visual discontinuities.

  • Which aspect ratios are new in the 2.6 release?

    The 2.6 model adds 4:3 and 3:4 to the existing 16:9, 9:16, and 1:1 options, covering traditional broadcast and print-adjacent formats that the 2.5 preview did not support.

  • How does the temporal stability improvement affect output quality?

    The 2.6 generation reduces frame-to-frame flicker on high-frequency visual details, text overlays, logos, hair, and fine textures render more cleanly than in the 2.5 preview, producing more professional-looking output.

  • Can I specify the number or timing of scene cuts in a multi-shot video?

    The model determines scene structure automatically based on the prompt content. You influence the number of scenes through the narrative structure of your prompt rather than through an explicit parameter.

  • What duration options are available?

    Clips can be generated at 5, 10, or 15 seconds. The 15-second option is the longest in the Wan 2.6 series.

  • Is 480p output available with this model?

    No. Wan v2.6 Text-to-Video supports 720p and 1080p only. For 480p draft-quality work or cost savings, use the Wan v2.5 T2V Preview.