Wan v2.5 Text-to-Video Preview

Wan v2.5 Text-to-Video Preview provides early access to Alibaba Cloud's text-to-video rendering pipeline, generating clips up to 10 seconds at resolutions from 480p to 1080p with built-in audio synchronization. Your use subject to Alibaba Cloud's Terms & Privacy Policies.

text-to-video

Use with AI Gateway View docs

index.ts

import { experimental_generateVideo as generateVideo } from 'ai';

const result = await generateVideo({
  model: 'alibaba/wan-v2.5-t2v-preview',
  prompt: 'A serene mountain lake at sunrise.'
});

Overview About Providers Similar FAQ

About Wan v2.5 Text-to-Video Preview

Wan v2.5 Text-to-Video Preview marked Alibaba Cloud's initial public release of the Wan text-to-video architecture. Given only a free-form text prompt, it produces single-shot video clips up to 10 seconds long, with output available at 480p, 720p, or 1080p across three aspect ratios: landscape (16:9), portrait (9:16), and square (1:1).

What set this release apart from many early text-to-video models was its integrated audio generation. Rather than rendering silent video and requiring a separate dubbing pass, the 2.5 pipeline synthesizes ambient sound, effects, and even prompted character dialogue with lip-sync, all within a single generation call. For workflows that need audio-visual output, this removes an entire post-processing step.

The preview designation means the model is intended primarily for evaluation and prototyping. Teams can use it to develop prompt strategies, validate resolution and aspect ratio choices, and estimate costs at the 480p tier before scaling up to the production-grade Wan 2.6 models.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Wan v2.5 Text-to-Video Preview

About Wan v2.5 Text-to-Video Preview