Skip to content
Vercel April 2026 security incident

Wan v2.5 Text-to-Video Preview

alibaba/wan-v2.5-t2v-preview

Wan v2.5 Text-to-Video Preview provides early access to Alibaba's text-to-video rendering pipeline, generating clips up to 10 seconds at resolutions from 480p to 1080p with built-in audio synchronization.

text-to-video
index.ts
import { experimental_generateVideo as generateVideo } from 'ai';
const result = await generateVideo({
model: 'alibaba/wan-v2.5-t2v-preview',
prompt: 'A serene mountain lake at sunrise.'
});

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

Preview-tier models may exhibit higher generation latency than their stable counterparts. If your workflow has strict turnaround requirements, benchmark generation times at your target resolution before integrating into production.

When to Use Wan v2.5 Text-to-Video Preview

Best For

  • Evaluating the wan pipeline:

    Exploring Alibaba's text-to-video system without committing to the higher-capability 2.6 series

  • Short social clips:

    Generating videos up to 10 seconds from text descriptions, particularly in portrait 9:16 format

  • Baked-in audio workflows:

    Use cases that need audio in the generated video without a separate dubbing tool

  • 480p cost exploration:

    Running cost tests at the lower resolution tier before budgeting for higher-fidelity output

Consider Alternatives When

  • Longer or multi-shot clips:

    Wan-v2.6-t2v extends to 15 seconds with multi-shot storytelling and automatic scene transitions

  • Image-to-video conversion:

    Wan-v2.6-i2v and wan-v2.6-i2v-flash handle animation from a source image

  • Consistent character identity:

    Wan-v2.6-r2v provides reference-based identity transfer across scenes

Conclusion

Wan v2.5 Text-to-Video Preview remains a useful on-ramp for developers entering the Alibaba video generation ecosystem. Its integrated audio pipeline, flexible resolution options, and preview-tier positioning make it well-suited for prototyping and low-resolution production work.

FAQ

The v2.5 preview supports 480p output (which v2.6 T2V doesn't), making it a lower-cost option for draft-quality renders and prompt experimentation. It also serves as a lighter-weight entry point for teams still evaluating the Wan pipeline.

Yes. The 9:16 aspect ratio produces portrait-oriented output suitable for platforms like TikTok, Instagram Reels, and YouTube Shorts.

Audio is generated in the same rendering pass as the video. The model produces ambient sound, effects, and, if the prompt describes speech, character dialogue with lip-sync, all without requiring a separate audio generation tool.

Descriptive scene prompts that specify setting, action, and mood tend to produce the most coherent output. Including details about lighting, camera angle, and desired audio cues gives the model more information to work with.

You can request specific durations within the model's range. The maximum is 10 seconds; for longer output, use the Wan v2.6 T2V model.

Access requires a Pro or Enterprise plan or paid AI Gateway usage.