Skip to content

Gemini 3.1 Flash Image Preview (Nano Banana 2)

google/gemini-3.1-flash-image-preview

Gemini 3.1 Flash Image Preview (Nano Banana 2) Preview (Nano Banana 2) improves visual output quality at flash-tier speed, adding Google Image Search grounding, configurable thinking levels, and new resolution and aspect ratio options including 512p and ultra-wide formats.

Image GenWeb SearchReasoningVision (Image)
index.ts
import { generateText } from 'ai'
const result = await generateText({
model: 'google/gemini-3.1-flash-image-preview',
prompt: 'Render a picture of a red balloon.',
});

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

This is a multimodal model: use streamText or generateText and specify responseModalities: ['TEXT', 'IMAGE'] in providerOptions.google to receive image output. You can also set thinkingConfig.thinkingLevel to 'minimal' or 'high' to control reasoning depth per request.

When to Use Gemini 3.1 Flash Image Preview (Nano Banana 2)

Best For

  • Real-world grounded imagery:

    Image generation tasks that require grounding in current subjects, landmarks, or recent events

  • Technical diagram generation:

    Configurable thinking depth improves spatial accuracy and label placement

  • Unusual aspect ratios:

    Creative asset production requiring 1:4, 1:8 ratios or 512p resolution

  • Multimodal text and image output:

    Single-response workloads at flash-tier cost

  • Rapid complex visual iteration:

    Using the Minimal thinking level to balance speed and reasoning

Consider Alternatives When

  • Highest image quality required:

    Your workflow supports pro-tier latency and cost (consider google/gemini-3-pro-image)

  • Pure image generation API:

    You do not need multimodal text output (consider google/imagen-4.0-generate-001)

  • Simple prompts:

    Thinking levels and search grounding add unnecessary overhead

  • Video output required:

    Still images are not sufficient (consider the Veo model family)

Conclusion

Gemini 3.1 Flash Image Preview (Nano Banana 2) Preview closes the gap between flash-tier generation speed and pro-level visual intelligence by adding search grounding, reasoning control, and broader format support. For teams that need current-event-aware imagery or complex diagrams at flash cost, it provides capabilities that earlier flash-tier models did not offer.

FAQ

At generation time, the model can query Google's image index to retrieve live visual data for the subject you describe. This improves rendering accuracy for subjects that may not be well-represented in static training data, such as specific real-world locations or recent events.

minimal and high. Use minimal when speed is the priority and the prompt is relatively straightforward. Use high when the prompt requires precise spatial reasoning, complex diagram layout, or multi-element compositions where reasoning before rendering reduces errors.

1:4 and 1:8 aspect ratios alongside 512p resolution. These expand the model's usefulness for narrow-format creative assets such as web banners, vertical strips, and other non-standard formats.

Yes. Use streamText from the AI SDK with responseModalities: ['TEXT', 'IMAGE'] in providerOptions.google.

Yes. Because this is a multimodal model, you must include responseModalities: ['TEXT', 'IMAGE'] in the provider options to receive image output. The model will not emit images without this configuration.

Gemini 3 Pro Image targets professional and creative workflows with higher resolution, higher multi-image input limits, and more advanced compositing support. Gemini 3.1 Flash Image Preview (Nano Banana 2) Preview prioritizes generation speed and cost efficiency while adding grounding and thinking capabilities that were absent from the original flash-tier image model.

Yes, its flash-tier cost and speed profile are designed for production workloads. Using thinkingLevel: 'minimal' minimizes additional latency from the reasoning step.

It streams the model's reasoning tokens before the generated image, giving visibility into how the model interpreted the prompt and planned the composition. This is useful for debugging prompts that produce unexpected output.