What makes R2V Flash different from the standard R2V model?

Flash is speed-optimized. It generates reference-consistent video much faster than the standard R2V at a potential tradeoff in peak identity fidelity. For drafts and iteration, Flash is preferred; for final output, the standard R2V model is recommended.

Does R2V Flash support the same reference video format as standard R2V?

Yes. Both accept the same reference URL lists and prompt conventions: use `character1`, `character2`, and so on in the prompt, in URL order, with 2 to 30 seconds per video reference where applicable.

What resolutions does R2V Flash support?

720p and 1080p. The R2V variants don't include a 480p option.

What is the maximum generated video length?

Output duration is 2 to 10 seconds for Wan R2V on AI Gateway. The 15-second option available on some T2V and I2V models does not apply here.

Can R2V Flash handle multiple characters from different reference clips in one scene?

Yes. You can combine several reference URLs in one request (within provider limits) and name them `character1`, `character2`, and so on in the prompt.

Is audio included in generated output?

Voice and audio characteristics captured from the reference clips are part of the identity extraction process; check provider-level documentation for specific audio output behavior.

Dashboard

Wan v2.6 Reference-to-Video Flash

Wan v2.6 Reference-to-Video Flash is Alibaba's fast reference-to-video model that preserves subject identity from video references and generates new scenes at speed, supporting 720p and 1080p output for rapid creative iteration.

reference-to-video

index.ts

import { experimental_generateVideo as generateVideo } from 'ai';

const result = await generateVideo({
  model: 'alibaba/wan-v2.6-r2v-flash',
  prompt: 'A serene mountain lake at sunrise.'
});

Overview Playground About Providers Similar FAQ

Playground

Try out Wan v2.6 Reference-to-Video Flash by Alibaba. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

—

12/16/2025

More models by Alibaba

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

991K

3.2s

89tps

$1.25/M

$3.75/M

Read:$0.25/M

Write:$1.56/M

—

05/21/2026

240K

3.3s

65tps

$1.30/M

$7.80/M

Read:

$0.26/M

Write:

$1.63/M

—

04/20/2026

1.2s

55tps

$0.50/M

$3.00/M

Read:

$0.1/M

Write:

$0.63/M

—

04/02/2026

0.9s

177tps

$0.10/M

$0.40/M

Read:$0.0/M

Write:$0.13/M

—

02/24/2026

1.6s

110tps

$0.40/M

$2.40/M

Read:

$0.04/M

Write:

$0.5/M

—

02/16/2026

262K

0.4s

79tps

$0.07/M

$0.46/M

—

04/01/2025

About Wan v2.6 Reference-to-Video Flash

Wan v2.6 Reference-to-Video Flash shares the same core capability as the standard R2V model, extracting a subject's visual and vocal identity from a reference video and placing them into text-prompted new scenes, but is built for contexts where generation speed is the binding constraint. The Flash architecture reduces wait times substantially, enabling creative teams to iterate through prompt variations and scene configurations much faster than the standard model allows.

The reference extraction pipeline works the same way: pass reference URLs alongside a descriptive prompt, and use character1, character2, and so on in the instruction to match URL order (images or videos; 2 to 30 seconds per video reference, within provider limits on total references). The model extracts appearance, movement style, and voice characteristics and applies them to the generated output. Because the Flash variant is optimized for speed rather than peak reconstruction fidelity, fine details in identity preservation may differ slightly from the standard R2V output.

R2V Flash supports 720p and 1080p resolutions and output durations of 2 to 10 seconds, making it suitable for social content previews, storyboard animatics, or any workflow where a director needs rapid visual confirmation that a scene concept works before ordering full-quality renders.

What To Consider When Choosing a Provider

Configuration: When a production deliverable demands maximum identity fidelity from the reference material, evaluate the standard wan-v2.6-r2v model before finalizing your pipeline choice.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Wan v2.6 Reference-to-Video Flash

Best For

Fast scene concept iteration: Verifying character identity before committing to full-quality R2V renders
High-volume social content: Pipelines where volume and speed of character-consistent clips outweigh pixel-perfect fidelity
Storyboards and animatics: Live-action or animation pre-production using real talent references
High-throughput brand content: Generating many assets quickly where a mascot or spokesperson must appear consistently

Consider Alternatives When

Maximum identity fidelity: Use wan-v2.6-r2v for the highest-quality character transfer in final deliveries
Still-photo source material: Use wan-v2.6-i2v-flash for image-based animation when the source is a still photo
No reference subject needed: Use wan-v2.6-t2v for purely text-prompted video generation

Conclusion

Wan v2.6 Reference-to-Video Flash makes identity-consistent video generation fast enough for iterative creative workflows, preserving the reference-to-video capability that makes the R2V series unique while dramatically shortening generation time. It fits naturally into pipelines where R2V Flash handles draft cycles and the standard R2V model handles final output.

Frequently Asked Questions

What makes R2V Flash different from the standard R2V model?
Flash is speed-optimized. It generates reference-consistent video much faster than the standard R2V at a potential tradeoff in peak identity fidelity. For drafts and iteration, Flash is preferred; for final output, the standard R2V model is recommended.
Does R2V Flash support the same reference video format as standard R2V?
Yes. Both accept the same reference URL lists and prompt conventions: use character1, character2, and so on in the prompt, in URL order, with 2 to 30 seconds per video reference where applicable.
What resolutions does R2V Flash support?
720p and 1080p. The R2V variants don't include a 480p option.
What is the maximum generated video length?
Output duration is 2 to 10 seconds for Wan R2V on AI Gateway. The 15-second option available on some T2V and I2V models does not apply here.
Can R2V Flash handle multiple characters from different reference clips in one scene?
Yes. You can combine several reference URLs in one request (within provider limits) and name them character1, character2, and so on in the prompt.
Is audio included in generated output?
Voice and audio characteristics captured from the reference clips are part of the identity extraction process; check provider-level documentation for specific audio output behavior.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Wan v2.6 Reference-to-Video Flash

Playground

Providers

More models by Alibaba

About Wan v2.6 Reference-to-Video Flash

What To Consider When Choosing a Provider

When to Use Wan v2.6 Reference-to-Video Flash

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions