VercelVercel
Menu

Image-to-Video Generation

Last updated February 19, 2026

Animate a static image into a video. The image you provide becomes the video content itself - you're adding motion to that exact scene.

This is different from reference-to-video, where reference images show the model what characters look like, but the video is a completely new scene.

KlingAI's image-to-video models animate images with standard or professional quality modes.

ModelDescription
klingai/kling-v3.0-i2vMulti-shot generation, 15s clips, enhanced consistency
klingai/kling-v2.6-i2vAudio-visual co-generation, cinematic motion
klingai/kling-v2.5-turbo-i2vFaster generation, lower cost
ParameterTypeRequiredDescription
prompt.imagestring | BufferYesThe image to animate. See image requirements below.
prompt.textstringNoDescription of the motion. Max 2500 characters.
duration5 | 10NoVideo length in seconds. Defaults to 5.
providerOptions.klingai.mode'std' | 'pro'No'std' for standard quality. 'pro' for professional quality. Defaults to 'std'.
providerOptions.klingai.negativePromptstringNoWhat to avoid in the video. Max 2500 characters.
providerOptions.klingai.cfgScalenumberNoPrompt adherence (0-1). Higher = stricter. Defaults to 0.5. Not supported on v2.x.
providerOptions.klingai.sound'on' | 'off'NoGenerate audio. Defaults to 'off'. Requires v2.6+.
providerOptions.klingai.voiceListarrayNoVoice IDs for speech. Max 2 voices. Requires v2.6+. See voice generation.
providerOptions.klingai.watermarkInfoobjectNoSet { enabled: true } to generate watermarked result.
providerOptions.klingai.pollIntervalMsnumberNoHow often to check task status. Defaults to 5000.
providerOptions.klingai.pollTimeoutMsnumberNoMaximum wait time. Defaults to 600000 (10 minutes).

The input image (prompt.image) must meet these requirements:

  • Formats: .jpg, .jpeg, .png
  • File size: 10MB or less
  • Dimensions: Minimum 300px
  • Aspect ratio: Between 1:2.5 and 2.5:1

When using base64 encoding, submit only the raw base64 string without any prefix:

// Correct
const image = 'iVBORw0KGgoAAAANSUhEUgAAAAUA...';
 
// Incorrect - do not include data: prefix
const image = 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA...';
klingai-image-to-video.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'klingai/kling-v2.6-i2v',
  prompt: {
    image: 'https://example.com/cat.png',
    text: 'The cat slowly turns its head and blinks',
  },
  duration: 5,
  providerOptions: {
    klingai: {
      mode: 'std',
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

Generate a video that transitions between a starting and ending image. The model interpolates the motion between the two frames.

ParameterTypeRequiredDescription
prompt.imagestring | BufferYesThe first frame (starting image).
providerOptions.klingai.imageTailstring | BufferYesThe last frame (ending image). Same format requirements as prompt.image.

When using imageTail, the following features are mutually exclusive and cannot be combined:

  • First/last frame (image + imageTail)
  • Motion brush (dynamicMasks / staticMask)
  • Camera control (cameraControl)
first-last-frame.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const firstFrame = fs.readFileSync('start.png');
const lastFrame = fs.readFileSync('end.png');
 
const result = await generateVideo({
  model: 'klingai/kling-v2.6-i2v',
  prompt: {
    image: firstFrame,
    text: 'Smooth transition between the two scenes',
  },
  providerOptions: {
    klingai: {
      imageTail: lastFrame,
      mode: 'pro',
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

Add speech to your video using voice IDs. Requires v2.6+ models with sound: 'on'.

Reference voices in your prompt using <<<voice_1>>> syntax, where the number matches the order in voiceList:

voice-generation.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'klingai/kling-v2.6-i2v',
  prompt: {
    image: 'https://example.com/person.png',
    text: 'The person<<<voice_1>>> says: "Hello, welcome to my channel"',
  },
  providerOptions: {
    klingai: {
      mode: 'std',
      sound: 'on',
      voiceList: [{ voiceId: 'your_voice_id' }],
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

You can use up to 2 voices per video. Voice IDs come from KlingAI's voice customization API or system preset voices.

Control camera movement during video generation. This is mutually exclusive with first/last frame and motion brush features.

ParameterTypeRequiredDescription
providerOptions.klingai.cameraControl.typestringYesCamera movement type. See options below.
providerOptions.klingai.cameraControl.configobjectNoMovement configuration. Required when type is 'simple'.

Camera movement types:

TypeDescriptionConfig required
'simple'Basic movement with one axisYes
'down_back'Camera descends and moves backwardNo
'forward_up'Camera moves forward and tilts upNo
'right_turn_forward'Rotate right then move forwardNo
'left_turn_forward'Rotate left then move forwardNo

Simple camera config options (use only one, set others to 0):

ConfigRangeDescription
horizontal[-10, 10]Camera translation along x-axis. Negative = left.
vertical[-10, 10]Camera translation along y-axis. Negative = down.
pan[-10, 10]Camera rotation around y-axis. Negative = left.
tilt[-10, 10]Camera rotation around x-axis. Negative = down.
roll[-10, 10]Camera rotation around z-axis. Negative = counter-clockwise.
zoom[-10, 10]Focal length change. Negative = narrower FOV.
camera-control.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'klingai/kling-v2.6-i2v',
  prompt: {
    image: 'https://example.com/landscape.png',
    text: 'A serene mountain landscape',
  },
  providerOptions: {
    klingai: {
      mode: 'std',
      cameraControl: {
        type: 'simple',
        config: {
          zoom: 5,
          horizontal: 0,
          vertical: 0,
          pan: 0,
          tilt: 0,
          roll: 0,
        },
      },
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

Control which parts of the image move and how using mask images. This is mutually exclusive with first/last frame and camera control features.

ParameterTypeRequiredDescription
providerOptions.klingai.staticMaskstringNoMask image for areas that should remain static.
providerOptions.klingai.dynamicMasksarrayNoArray of dynamic mask configurations (up to 6).
providerOptions.klingai.dynamicMasks[].maskstringYesMask image for areas that should move.
providerOptions.klingai.dynamicMasks[].trajectoriesarrayYesMotion path coordinates. 2-77 points for 5s video.

Mask requirements:

  • Same format as input image (.jpg, .jpeg, .png)
  • Aspect ratio must match the input image
  • All masks (staticMask and dynamicMasks[].mask) must have identical resolution

Trajectory coordinates use the bottom-left corner of the image as origin. More points create more accurate paths.

motion-brush.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'klingai/kling-v2.6-i2v',
  prompt: {
    image: 'https://example.com/scene.png',
    text: 'A ball bouncing across the scene',
  },
  providerOptions: {
    klingai: {
      mode: 'std',
      dynamicMasks: [
        {
          mask: 'https://example.com/ball-mask.png',
          trajectories: [
            { x: 100, y: 200 },
            { x: 200, y: 300 },
            { x: 300, y: 200 },
            { x: 400, y: 300 },
          ],
        },
      ],
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

Wan offers image-to-video with standard and flash (faster) variants, plus optional audio. Wan requires image URLs (not buffers). Use Vercel Blob to host local images.

ModelDescription
alibaba/wan-v2.6-i2vStandard quality image animation
alibaba/wan-v2.6-i2v-flashFaster generation with audio support
ParameterTypeRequiredDescription
prompt.imagestringYesURL of the image to animate (URLs only, not buffers)
prompt.textstringYesDescription of the motion or animation
durationnumberNoVideo length in seconds
providerOptions.alibaba.audiobooleanNoGenerate audio. Only available on flash models. Defaults to false
providerOptions.alibaba.pollIntervalMsnumberNoHow often to check task status. Defaults to 5000
providerOptions.alibaba.pollTimeoutMsnumberNoMaximum wait time. Defaults to 600000 (10 minutes)
wan-image-to-video.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'alibaba/wan-v2.6-i2v-flash',
  prompt: {
    image: 'https://example.com/cat.png',
    text: 'The cat waves hello and smiles',
  },
  duration: 5,
  providerOptions: {
    alibaba: {
      audio: true,
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

Grok Imagine Video (by xAI) can animate images into videos. The output defaults to the input image's aspect ratio. If you specify aspectRatio, it will override this and stretch the image to the desired ratio.

ModelDurationResolution
xai/grok-imagine-video1-15s480p, 720p
ParameterTypeRequiredDescription
prompt.imagestringYesURL of the image to animate
prompt.textstringNoDescription of the motion or animation
durationnumberNoVideo length in seconds (1-15)
aspectRatiostringNoOverride the input image's aspect ratio (stretches the image)
providerOptions.xai.resolution'480p' | '720p'NoVideo resolution. Defaults to 480p
providerOptions.xai.pollIntervalMsnumberNoHow often to check task status. Defaults to 5000
providerOptions.xai.pollTimeoutMsnumberNoMaximum wait time. Defaults to 600000 (10 minutes)
grok-image-to-video.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'xai/grok-imagine-video',
  prompt: {
    image: 'https://example.com/cat.png',
    text: 'The cat slowly turns its head and blinks',
  },
  duration: 5,
  providerOptions: {
    xai: {
      pollTimeoutMs: 600000,
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

Was this helpful?

supported.