Skip to content
Dashboard

Seedance v1.5 Pro

Seedance v1.5 Pro is ByteDance's first audio-visual joint generation video model, released December 16, 2025. It produces synchronized dialogue, sound effects, and ambient audio alongside 1080p video in one generation pass, with multilingual voice and regional dialect support.

Video Gen
index.ts
import { experimental_generateVideo as generateVideo } from 'ai';
const result = await generateVideo({
model: 'bytedance/seedance-v1.5-pro',
prompt: 'A serene mountain lake at sunrise.'
});

Playground

Try out Seedance v1.5 Pro by ByteDance. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
ByteDance
——
12/16/2025

More models by ByteDance

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
——
bytedance logo
04/14/2026
——
bytedance logo
04/14/2026
256K
1.1s
84tps
$0.25/M
$2.00/M
Read:
$0.05/M
Write:
—
——
+1
bytedance logo
09/01/2025
256K
1.2s
78tps
$0.25/M
$2.00/M
Read:
$0.05/M
Write:
—
——
+1
bytedance logo
09/01/2025
——
bytedance logo
06/01/2025
——
bytedance logo
06/01/2025

About Seedance v1.5 Pro

Seedance v1.5 Pro shifts the Seedance line from visual generation alone to joint audio-visual creation. Released December 16, 2025, it's the first Seedance model to generate voice, sound effects, and ambient audio synchronized to video in a single inference pass. You don't run a separate text-to-speech or audio compositing step.

The audio system supports multilingual speech generation across six languages: Chinese, English, Japanese, Korean, Spanish, and Indonesian. It also covers regional dialects such as Sichuanese and Cantonese. Vocal synthesis targets prosody and intonation that track the scene. Spatial reverb in sound effects matches the visual scene's physical context. ByteDance's release cites lip movement alignment, intonation patterning, and performance rhythm synchronization as focus areas versus listed baselines. See https://console.byteplus.com/ark/region:ark+ap-southeast-1/model/detail?Id=seedance-1-5-pro for tables and comparisons.

On the video side, Seedance v1.5 Pro raises the ceiling relative to Seedance 1.0 Pro. Where 1.0 focused on motion stability, 1.5 Pro extends camera control and finishing. You get cinematic camera controls including continuous long takes and dolly zooms, color grading controls, more facial detail in close-ups, and richer dynamic motion. Output supports 480p, 720p, and 1080p resolution at 24 fps, with clips from four to 12 seconds and seven aspect ratios.

What To Consider When Choosing a Provider

  • Configuration: For audio-visual workflows, confirm that your integration layer handles the combined audio-video output format before you deploy to production. Compare rates (N/A; N/A).
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Seedance v1.5 Pro

Best For

  • Character-driven video content: Creative briefs that include synchronized dialogue, lip alignment, and vocal performance
  • Multilingual content production: Chinese, English, Japanese, Korean, Spanish, or Indonesian voice plus regional dialect variants
  • Cinematic short-form content: Dolly zooms, long takes, and color grading that go beyond typical social clip defaults
  • Ambient-audio storytelling: Product demos, branded content, and explainer videos where spatial sound effects match the scene without manual audio post-production

Consider Alternatives When

  • Visual-only pipelines: Seedance 1.0 Pro offers lower cost when audio isn't a requirement
  • Maximum speed and cost efficiency: Seedance 1.0 Pro Fast is the primary choice when those drivers dominate
  • Unsupported languages: Verify support before committing when you need a dialect or language not yet covered by the audio system

Conclusion

Seedance v1.5 Pro closes the gap between AI video generation and full audio-visual production by eliminating the post-processing step of adding synchronized audio. For any project where voice, sound design, and video must arrive together, it's the only Seedance model that handles all three in one pass.

Frequently Asked Questions

  • What languages does Seedance v1.5 Pro support for voice generation?

    Six languages: Chinese, English, Japanese, Korean, Spanish, and Indonesian. Regional dialect coverage includes Sichuanese and Cantonese.

  • Does Seedance v1.5 Pro require a separate text-to-speech step for audio?

    No. Seedance v1.5 Pro generates voice, ambient sound, and sound effects in the same inference pass as the video. You don't need an external audio pipeline.

  • How does audio-visual synchronization work in Seedance v1.5 Pro?

    Seedance v1.5 Pro trains to align lip movements, intonation patterns, and performance rhythm with visual content. ByteDance's release documentation reports lower audio-visual misalignment than listed baselines in its tables. See https://console.byteplus.com/ark/region:ark+ap-southeast-1/model/detail?Id=seedance-1-5-pro.

  • What video specifications does Seedance v1.5 Pro support?

    Resolutions of 480p, 720p, and 1080p at 24 fps, clip durations from four to 12 seconds, and seven aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4, 21:9, and 9:21.

  • How does Seedance v1.5 Pro differ from Seedance 1.0 Pro on video quality alone?

    Seedance 1.5 Pro adds cinematic camera techniques (dolly zooms, long takes), color grading controls, and more facial detail in close-ups, beyond the motion stability focus of 1.0 Pro.

  • Can Seedance v1.5 Pro generate ambient sound without spoken dialogue?

    Yes. The audio system generates spatial sound effects and ambient audio that match the visual scene's physical environment, whether or not the scene contains speech.