Skip to content
Vercel April 2026 security incident

Nano Banana Pro (Gemini 3 Pro Image)

google/gemini-3-pro-image

Nano Banana Pro (Gemini 3 Pro Image) (Nano Banana Pro) is Google's advanced native image generation model built for professional and creative workflows, with accurate diagram labeling, web-search-grounded imagery, and higher resolution output.

Image GenWeb Search
index.ts
import { generateText } from 'ai'
const result = await generateText({
model: 'google/gemini-3-pro-image',
prompt: 'Render a picture of a red balloon.',
});

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

Because this is a multimodal model that uses generateText (not generateImage) for image output, ensure your integration handles both result.text and result.files in the response when processing generated images.

When to Use Nano Banana Pro (Gemini 3 Pro Image)

Best For

  • Labeled technical diagrams:

    Generating architecture charts and data flow visualizations with accurate labeling

  • High-resolution creative workflows:

    Requiring higher resolution images and multi-image compositing

  • Web-search-grounded imagery:

    Applications that need image generation grounded in current real-world information via web search

  • Mixed-media pipelines:

    A single request should return both explanatory text and accompanying images

  • Professional design and documentation:

    Workflows where image quality must match that of prior-generation flagship models

Consider Alternatives When

  • Pure image generation:

    You need images without text output (consider google/imagen-4.0-generate-001 or google/imagen-4.0-ultra-generate-001)

  • Language-only reasoning:

    Your use case has no image generation requirement (consider google/gemini-3-pro-preview)

  • Flash-tier speed and cost:

    Generation speed matters more than pro-level quality (consider google/gemini-3.1-flash-image-preview)

  • Video output required:

    Still images are not sufficient (consider the Veo model family)

Conclusion

Nano Banana Pro (Gemini 3 Pro Image) fills a gap in AI-assisted technical documentation and professional creative production. It generates accurately labeled diagrams and web-search-grounded imagery at higher resolutions. For teams building tools that produce technical illustrations, enriched visual content, or composite image workflows, it exposes capabilities that simpler image-generation models do not.

FAQ

Nano Banana Pro (Gemini 3 Pro Image) is a multimodal model, not a dedicated image generation API. It returns both text and image files in a single response. In the AI SDK, you call generateText and then iterate over result.files to access the generated images.

The model handles architectural schematics, data flow charts, and similar technical visualizations where text annotations must be precisely placed and readable.

The model can query Google's index at generation time to retrieve current visual reference data. This helps render lesser-known landmarks, recent events, or real-world objects accurately rather than relying solely on training data.

Nano Banana Pro (Gemini 3 Pro Image) generates higher resolution images compared to the base Nano Banana model. Check the model specs on this page for current resolution and pricing details.

Yes. Nano Banana Pro (Gemini 3 Pro Image) introduces higher multi-image input limits specifically to support compositing workflows where multiple reference images must be combined into a single generated output.

No. AI Gateway manages all provider credentials. You authenticate via a Vercel API key or OIDC token and AI Gateway routes requests to the appropriate provider automatically.

Nano Banana Pro (Gemini 3 Pro Image) targets advanced and professional use cases. It adds diagram labeling accuracy, web-search grounding for up-to-date imagery, higher resolution output, and higher multi-image input limits. These capabilities are not available in the base flash-tier image model.

Yes. Because this is a multimodal model using generateText, a single request returns result.text for any written content and result.files containing the generated images, allowing mixed-media responses in one API call.