Skip to content

Kling v2.6 Motion Control

klingai/kling-v2.6-motion-control

Kling v2.6 Motion Control transfers full-body motion from a 3-30 second reference clip to a generated scene, capturing gestures, facial expressions, lip-sync, and camera movement with frame-accurate fidelity.

Video Gen
index.ts
import { experimental_generateVideo as generateVideo } from 'ai';
const result = await generateVideo({
model: 'klingai/kling-v2.6-motion-control',
prompt: 'A serene mountain lake at sunrise.'
});

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

Reference video quality drives transfer accuracy. Use clear subjects, stable framing, and well-lit motion.

When to Use Kling v2.6 Motion Control

Best For

  • Dance and performance content:

    A reference performance transfers to a generated character or setting

  • Talking-head dialogue:

    Videos that need accurate facial expression and lip-sync replication

  • Camera movement transfer:

    Camera behavior from a reference shot carries into the new scene

  • Fashion and product video:

    Production using human movement references in controlled studio footage

Consider Alternatives When

  • No reference video:

    You don't have a reference motion video and prefer purely text-driven generation

  • Simple image animation:

    You want to animate an image into short video without motion transfer, so see i2v variants

  • Multi-shot narratives:

    You need narrative generation with independent scene segments, so see v3.0

Conclusion

Kling v2.6 Motion Control transfers a precise movement sequence to a generated scene without manual animation or frame-by-frame keyframing. For character performance, staged shots, dance, or action, it moves hands, face, and camera together from the reference.

FAQ

The reference video should be 3-30 seconds long. Clearer subject visibility and stable framing produce more accurate motion transfer in the output.

Yes. Camera behavior (pan, push, pull, and rotation) in the reference clip replicates in the generated video, not just subject body motion.

The model reduces artifacts on fast, intricate motions. Hand articulation and high-speed body movements render with improved fidelity compared to earlier motion transfer approaches.

Outputs can reach up to 30 seconds. This eliminates the need to stitch multiple short clips together for longer sequences.

Yes. Facial expression tracking and lip-sync alignment transfer from the reference and apply to the generated subject.

A text prompt is optional. Use it to describe the desired scene, subject, and styling for the output. Motion derives from the reference clip while the prompt defines what appears in the new video.