Skip to content

Kling v3.0 Text-to-Video

Kling v3.0 Text-to-Video is Kling's v3.0 text-to-video model with multi-shot narrative generation, physics-aware motion, native multilingual audio, and up to 15-second output from a single prompt.

text-to-videomulti-shotaudio-generation
index.ts
import { experimental_generateVideo as generateVideo } from 'ai';
const result = await generateVideo({
model: 'klingai/kling-v3.0-t2v',
prompt: 'A serene mountain lake at sunrise.'
});

Frequently Asked Questions

  • What is multi-shot generation in Kling v3.0 Text-to-Video?

    A single prompt produces up to five distinct video shots in one inference pass. The shots form a continuous edited sequence. This enables narrative multi-scene videos without manual clip stitching.

  • What is the maximum output duration?

    Up to 15 seconds total across all shots, extended from 10 seconds in earlier Kling versions.

  • What languages does the native audio generation support in v3.0?

    English, Chinese, Japanese, Korean, and Spanish, including accented variants.

  • How does v3.0 t2v differ from v2.6 t2v?

    V3.0 adds multi-shot narrative generation, extends maximum duration to 15 seconds, improves physics simulation, and expands audio language support compared to v2.6.

  • How is total video cost calculated for multi-shot output?

    This page lists the current rates. Multiple providers can serve Kling v3.0 Text-to-Video, so AI Gateway surfaces live pricing rather than a single fixed figure.

  • Does v3.0 t2v support vertical video output?

    Yes. The 9:16 aspect ratio is supported alongside 16:9 and 1:1.