Gemini 3.1 Flash Image Preview (Nano Banana 2)
Gemini 3.1 Flash Image Preview (Nano Banana 2) Preview (Nano Banana 2) improves visual output quality at flash-tier speed, adding Google Image Search grounding, configurable thinking levels, and new resolution and aspect ratio options including 512p and ultra-wide formats.
import { generateText } from 'ai'
const result = await generateText({ model: 'google/gemini-3.1-flash-image-preview', prompt: 'Render a picture of a red balloon.',});Playground
Try out Gemini 3.1 Flash Image Preview (Nano Banana 2) by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Google
| Model |
|---|
About Gemini 3.1 Flash Image Preview (Nano Banana 2)
Gemini 3.1 Flash Image Preview (Nano Banana 2) Preview, codenamed Nano Banana 2, advances over the prior flash-tier image model with improved visual quality while preserving the generation speed and cost profile that makes flash-tier models viable for production workloads. Three feature additions meaningfully expand what flash-tier image generation can do.
The first is Google Image Search grounding. The model can retrieve live visual reference data from Google's index at generation time, which allows it to render lesser-known landmarks, brand-specific objects, and recent real-world subjects accurately. The second addition is configurable thinking levels. Gemini 3.1 Flash Image Preview (Nano Banana 2) Preview introduces Minimal and High thinking modes, enabling the model to reason through complex prompts before rendering. The third addition is expanded output options: 512p resolution and 1:4 and 1:8 aspect ratios join the existing format options, opening the model to narrow-format creative assets like banners and vertical media strips.
What To Consider When Choosing a Provider
- Configuration: This is a multimodal model: use
streamTextorgenerateTextand specifyresponseModalities: ['TEXT', 'IMAGE']inproviderOptions.googleto receive image output. You can also setthinkingConfig.thinkingLevelto'minimal'or'high'to control reasoning depth per request. - Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Gemini 3.1 Flash Image Preview (Nano Banana 2)
Best For
- Real-world grounded imagery: Image generation tasks that require grounding in current subjects, landmarks, or recent events
- Technical diagram generation: Configurable thinking depth improves spatial accuracy and label placement
- Unusual aspect ratios: Creative asset production requiring 1:4, 1:8 ratios or 512p resolution
- Multimodal text and image output: Single-response workloads at flash-tier cost
- Rapid complex visual iteration: Using the
Minimalthinking level to balance speed and reasoning
Consider Alternatives When
- Highest image quality required: Your workflow supports pro-tier latency and cost (consider
google/gemini-3-pro-image) - Pure image generation API: You do not need multimodal text output (consider
google/imagen-4.0-generate-001) - Simple prompts: Thinking levels and search grounding add unnecessary overhead
- Video output required: Still images are not sufficient (consider the Veo model family)
Conclusion
Gemini 3.1 Flash Image Preview (Nano Banana 2) Preview closes the gap between flash-tier generation speed and pro-level visual intelligence by adding search grounding, reasoning control, and broader format support. For teams that need current-event-aware imagery or complex diagrams at flash cost, it provides capabilities that earlier flash-tier models did not offer.
Frequently Asked Questions
How does Google Image Search grounding work in this model?
At generation time, the model can query Google's image index to retrieve live visual data for the subject you describe. This improves rendering accuracy for subjects that may not be well-represented in static training data, such as specific real-world locations or recent events.
What are the available thinking levels and when should I use each?
minimalandhigh. Useminimalwhen speed is the priority and the prompt is relatively straightforward. Usehighwhen the prompt requires precise spatial reasoning, complex diagram layout, or multi-element compositions where reasoning before rendering reduces errors.What new aspect ratios are available in Gemini 3.1 Flash Image Preview (Nano Banana 2) Preview?
1:4 and 1:8 aspect ratios alongside 512p resolution. These expand the model's usefulness for narrow-format creative assets such as web banners, vertical strips, and other non-standard formats.
Does this model support streaming?
Yes. Use
streamTextfrom the AI SDK withresponseModalities: ['TEXT', 'IMAGE']inproviderOptions.google.Do I need to set
responseModalitiesexplicitly?Yes. Because this is a multimodal model, you must include
responseModalities: ['TEXT', 'IMAGE']in the provider options to receive image output. The model will not emit images without this configuration.How does this model compare to Gemini 3 Pro Image?
Gemini 3 Pro Image targets professional and creative workflows with higher resolution, higher multi-image input limits, and more advanced compositing support. Gemini 3.1 Flash Image Preview (Nano Banana 2) Preview prioritizes generation speed and cost efficiency while adding grounding and thinking capabilities that were absent from the original flash-tier image model.
Can I use this model for real-time applications?
Yes, its flash-tier cost and speed profile are designed for production workloads. Using
thinkingLevel: 'minimal'minimizes additional latency from the reasoning step.What does
includeThoughts: truereturn?It streams the model's reasoning tokens before the generated image, giving visibility into how the model interpreted the prompt and planned the composition. This is useful for debugging prompts that produce unexpected output.