Gemini 3.1 Flash Image Preview (Nano Banana 2)
Gemini 3.1 Flash Image Preview (Nano Banana 2) Preview (Nano Banana 2) improves visual output quality at flash-tier speed, adding Google Image Search grounding, configurable thinking levels, and new resolution and aspect ratio options including 512p and ultra-wide formats.
import { generateText } from 'ai'
const result = await generateText({ model: 'google/gemini-3.1-flash-image-preview', prompt: 'Render a picture of a red balloon.',});What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
This is a multimodal model: use streamText or generateText and specify responseModalities: ['TEXT', 'IMAGE'] in providerOptions.google to receive image output. You can also set thinkingConfig.thinkingLevel to 'minimal' or 'high' to control reasoning depth per request.
When to Use Gemini 3.1 Flash Image Preview (Nano Banana 2)
Best For
Real-world grounded imagery:
Image generation tasks that require grounding in current subjects, landmarks, or recent events
Technical diagram generation:
Configurable thinking depth improves spatial accuracy and label placement
Unusual aspect ratios:
Creative asset production requiring 1:4, 1:8 ratios or 512p resolution
Multimodal text and image output:
Single-response workloads at flash-tier cost
Rapid complex visual iteration:
Using the
Minimalthinking level to balance speed and reasoning
Consider Alternatives When
Highest image quality required:
Your workflow supports pro-tier latency and cost (consider
google/gemini-3-pro-image)Pure image generation API:
You do not need multimodal text output (consider
google/imagen-4.0-generate-001)Simple prompts:
Thinking levels and search grounding add unnecessary overhead
Video output required:
Still images are not sufficient (consider the Veo model family)
Conclusion
Gemini 3.1 Flash Image Preview (Nano Banana 2) Preview closes the gap between flash-tier generation speed and pro-level visual intelligence by adding search grounding, reasoning control, and broader format support. For teams that need current-event-aware imagery or complex diagrams at flash cost, it provides capabilities that earlier flash-tier models did not offer.
FAQ
At generation time, the model can query Google's image index to retrieve live visual data for the subject you describe. This improves rendering accuracy for subjects that may not be well-represented in static training data, such as specific real-world locations or recent events.
minimal and high. Use minimal when speed is the priority and the prompt is relatively straightforward. Use high when the prompt requires precise spatial reasoning, complex diagram layout, or multi-element compositions where reasoning before rendering reduces errors.
1:4 and 1:8 aspect ratios alongside 512p resolution. These expand the model's usefulness for narrow-format creative assets such as web banners, vertical strips, and other non-standard formats.
Yes. Use streamText from the AI SDK with responseModalities: ['TEXT', 'IMAGE'] in providerOptions.google.
Yes. Because this is a multimodal model, you must include responseModalities: ['TEXT', 'IMAGE'] in the provider options to receive image output. The model will not emit images without this configuration.
Gemini 3 Pro Image targets professional and creative workflows with higher resolution, higher multi-image input limits, and more advanced compositing support. Gemini 3.1 Flash Image Preview (Nano Banana 2) Preview prioritizes generation speed and cost efficiency while adding grounding and thinking capabilities that were absent from the original flash-tier image model.
Yes, its flash-tier cost and speed profile are designed for production workloads. Using thinkingLevel: 'minimal' minimizes additional latency from the reasoning step.
It streams the model's reasoning tokens before the generated image, giving visibility into how the model interpreted the prompt and planned the composition. This is useful for debugging prompts that produce unexpected output.