Kling v3.0 Text-to-Video
Kling v3.0 Text-to-Video is Kling's v3.0 text-to-video model with multi-shot narrative generation, physics-aware motion, native multilingual audio, and up to 15-second output from a single prompt.
import { experimental_generateVideo as generateVideo } from 'ai';
const result = await generateVideo({ model: 'klingai/kling-v3.0-t2v', prompt: 'A serene mountain lake at sunrise.'});Frequently Asked Questions
What is multi-shot generation in Kling v3.0 Text-to-Video?
A single prompt produces up to five distinct video shots in one inference pass. The shots form a continuous edited sequence. This enables narrative multi-scene videos without manual clip stitching.
What is the maximum output duration?
Up to 15 seconds total across all shots, extended from 10 seconds in earlier Kling versions.
What languages does the native audio generation support in v3.0?
English, Chinese, Japanese, Korean, and Spanish, including accented variants.
How does v3.0 t2v differ from v2.6 t2v?
V3.0 adds multi-shot narrative generation, extends maximum duration to 15 seconds, improves physics simulation, and expands audio language support compared to v2.6.
How is total video cost calculated for multi-shot output?
This page lists the current rates. Multiple providers can serve Kling v3.0 Text-to-Video, so AI Gateway surfaces live pricing rather than a single fixed figure.
Does v3.0 t2v support vertical video output?
Yes. The 9:16 aspect ratio is supported alongside 16:9 and 1:1.