Why would I choose the v2.5 preview over the newer v2.6 model?

The v2.5 preview supports 480p output (which v2.6 T2V doesn't), making it a lower-cost option for draft-quality renders and prompt experimentation. It also serves as a lighter-weight entry point for teams still evaluating the Wan pipeline.

Can this model generate vertical video for mobile platforms?

Yes. The 9:16 aspect ratio produces portrait-oriented output suitable for platforms like TikTok, Instagram Reels, and YouTube Shorts.

How does the built-in audio feature work?

Audio is generated in the same rendering pass as the video. The model produces ambient sound, effects, and, if the prompt describes speech, character dialogue with lip-sync, all without requiring a separate audio generation tool.

What kind of text prompts work best with this model?

Descriptive scene prompts that specify setting, action, and mood tend to produce the most coherent output. Including details about lighting, camera angle, and desired audio cues gives the model more information to work with.

Is there a way to control video duration precisely?

You can request specific durations within the model's range. The maximum is 10 seconds; for longer output, use the Wan v2.6 T2V model.

Do I need special Vercel plan access to use this model?

Access requires a Pro or Enterprise plan or paid AI Gateway usage.

Dashboard

Wan v2.5 Text-to-Video Preview

Wan v2.5 Text-to-Video Preview provides early access to Alibaba's text-to-video rendering pipeline, generating clips up to 10 seconds at resolutions from 480p to 1080p with built-in audio synchronization.

text-to-video

index.ts

import { experimental_generateVideo as generateVideo } from 'ai';

const result = await generateVideo({
  model: 'alibaba/wan-v2.5-t2v-preview',
  prompt: 'A serene mountain lake at sunrise.'
});

Overview Playground About Providers Similar FAQ

Playground

Try out Wan v2.5 Text-to-Video Preview by Alibaba. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

—

09/24/2025

More models by Alibaba

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

0.9s

125tps

$1.25/M

$3.75/M

Read:$0.25/M

Write:$1.56/M

—

05/21/2026

240K

1.7s

83tps

$1.30/M

$7.80/M

Read:

$0.26/M

Write:

$1.63/M

—

04/20/2026

0.6s

95tps

$0.50/M

$3.00/M

Read:

$0.1/M

Write:

$0.63/M

—

04/02/2026

0.6s

241tps

$0.10/M

$0.40/M

Read:$0.0/M

Write:$0.13/M

—

02/24/2026

1.3s

110tps

$0.40/M

$2.40/M

Read:

$0.04/M

Write:

$0.5/M

—

02/16/2026

262K

0.4s

68tps

$0.07/M

$0.46/M

—

04/01/2025

About Wan v2.5 Text-to-Video Preview

Wan v2.5 Text-to-Video Preview marked Alibaba's initial public release of the Wan text-to-video architecture. Given only a free-form text prompt, it produces single-shot video clips up to 10 seconds long, with output available at 480p, 720p, or 1080p across three aspect ratios: landscape (16:9), portrait (9:16), and square (1:1).

What set this release apart from many early text-to-video models was its integrated audio generation. Rather than rendering silent video and requiring a separate dubbing pass, the 2.5 pipeline synthesizes ambient sound, effects, and even prompted character dialogue with lip-sync, all within a single generation call. For workflows that need audio-visual output, this removes an entire post-processing step.

The preview designation means the model is intended primarily for evaluation and prototyping. Teams can use it to develop prompt strategies, validate resolution and aspect ratio choices, and estimate costs at the 480p tier before scaling up to the production-grade Wan 2.6 models.

What To Consider When Choosing a Provider

Configuration: Preview-tier models may exhibit higher generation latency than their stable counterparts. If your workflow has strict turnaround requirements, benchmark generation times at your target resolution before integrating into production.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Wan v2.5 Text-to-Video Preview

Best For

Evaluating the wan pipeline: Exploring Alibaba's text-to-video system without committing to the higher-capability 2.6 series
Short social clips: Generating videos up to 10 seconds from text descriptions, particularly in portrait 9:16 format
Baked-in audio workflows: Use cases that need audio in the generated video without a separate dubbing tool
480p cost exploration: Running cost tests at the lower resolution tier before budgeting for higher-fidelity output

Consider Alternatives When

Longer or multi-shot clips: Wan-v2.6-t2v extends to 15 seconds with multi-shot storytelling and automatic scene transitions
Image-to-video conversion: Wan-v2.6-i2v and wan-v2.6-i2v-flash handle animation from a source image
Consistent character identity: Wan-v2.6-r2v provides reference-based identity transfer across scenes

Conclusion

Wan v2.5 Text-to-Video Preview remains a useful on-ramp for developers entering the Alibaba video generation ecosystem. Its integrated audio pipeline, flexible resolution options, and preview-tier positioning make it well-suited for prototyping and low-resolution production work.

Frequently Asked Questions

Why would I choose the v2.5 preview over the newer v2.6 model?
The v2.5 preview supports 480p output (which v2.6 T2V doesn't), making it a lower-cost option for draft-quality renders and prompt experimentation. It also serves as a lighter-weight entry point for teams still evaluating the Wan pipeline.
Can this model generate vertical video for mobile platforms?
Yes. The 9:16 aspect ratio produces portrait-oriented output suitable for platforms like TikTok, Instagram Reels, and YouTube Shorts.
How does the built-in audio feature work?
Audio is generated in the same rendering pass as the video. The model produces ambient sound, effects, and, if the prompt describes speech, character dialogue with lip-sync, all without requiring a separate audio generation tool.
What kind of text prompts work best with this model?
Descriptive scene prompts that specify setting, action, and mood tend to produce the most coherent output. Including details about lighting, camera angle, and desired audio cues gives the model more information to work with.
Is there a way to control video duration precisely?
You can request specific durations within the model's range. The maximum is 10 seconds; for longer output, use the Wan v2.6 T2V model.
Do I need special Vercel plan access to use this model?
Access requires a Pro or Enterprise plan or paid AI Gateway usage.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Wan v2.5 Text-to-Video Preview

Playground

Providers

More models by Alibaba

About Wan v2.5 Text-to-Video Preview

What To Consider When Choosing a Provider

When to Use Wan v2.5 Text-to-Video Preview

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions