Question 1

What is multi-shot generation in Kling v3.0 Text-to-Video?

Accepted Answer

A single prompt produces up to five distinct video shots in one inference pass. The shots form a continuous edited sequence. This enables narrative multi-scene videos without manual clip stitching.

Question 2

What is the maximum output duration?

Accepted Answer

Up to 15 seconds total across all shots, extended from 10 seconds in earlier Kling versions.

Question 3

What languages does the native audio generation support in v3.0?

Accepted Answer

English, Chinese, Japanese, Korean, and Spanish, including accented variants.

Question 4

How does v3.0 t2v differ from v2.6 t2v?

Accepted Answer

V3.0 adds multi-shot narrative generation, extends maximum duration to 15 seconds, improves physics simulation, and expands audio language support compared to v2.6.

Question 5

How is total video cost calculated for multi-shot output?

Accepted Answer

This page lists the current rates. Multiple providers can serve Kling v3.0 Text-to-Video, so AI Gateway surfaces live pricing rather than a single fixed figure.

Question 6

Does v3.0 t2v support vertical video output?

Accepted Answer

Yes. The 9:16 aspect ratio is supported alongside 16:9 and 1:1.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Kling v3.0 Text-to-Video

Frequently Asked Questions