Kling v2.6 Motion Control
Kling v2.6 Motion Control transfers full-body motion from a 3-30 second reference clip to a generated scene, capturing gestures, facial expressions, lip-sync, and camera movement with frame-accurate fidelity.
import { experimental_generateVideo as generateVideo } from 'ai';
const result = await generateVideo({ model: 'klingai/kling-v2.6-motion-control', prompt: 'A serene mountain lake at sunrise.'});What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
Reference video quality drives transfer accuracy. Use clear subjects, stable framing, and well-lit motion.
When to Use Kling v2.6 Motion Control
Best For
Dance and performance content:
A reference performance transfers to a generated character or setting
Talking-head dialogue:
Videos that need accurate facial expression and lip-sync replication
Camera movement transfer:
Camera behavior from a reference shot carries into the new scene
Fashion and product video:
Production using human movement references in controlled studio footage
Consider Alternatives When
No reference video:
You don't have a reference motion video and prefer purely text-driven generation
Simple image animation:
You want to animate an image into short video without motion transfer, so see i2v variants
Multi-shot narratives:
You need narrative generation with independent scene segments, so see v3.0
Conclusion
Kling v2.6 Motion Control transfers a precise movement sequence to a generated scene without manual animation or frame-by-frame keyframing. For character performance, staged shots, dance, or action, it moves hands, face, and camera together from the reference.
FAQ
The reference video should be 3-30 seconds long. Clearer subject visibility and stable framing produce more accurate motion transfer in the output.
Yes. Camera behavior (pan, push, pull, and rotation) in the reference clip replicates in the generated video, not just subject body motion.
The model reduces artifacts on fast, intricate motions. Hand articulation and high-speed body movements render with improved fidelity compared to earlier motion transfer approaches.
Outputs can reach up to 30 seconds. This eliminates the need to stitch multiple short clips together for longer sequences.
Yes. Facial expression tracking and lip-sync alignment transfer from the reference and apply to the generated subject.
A text prompt is optional. Use it to describe the desired scene, subject, and styling for the output. Motion derives from the reference clip while the prompt defines what appears in the new video.