Nano Banana (Gemini 2.5 Flash Image)
Nano Banana (Gemini 2.5 Flash Image) is Google's native image generation and editing model, combining multimodal world knowledge with character consistency, targeted prompt-based edits, and multi-image fusion in a single model.
import { generateText } from 'ai'
const result = await generateText({ model: 'google/gemini-2.5-flash-image', prompt: 'Render a picture of a red balloon.',});Playground
Try out Nano Banana (Gemini 2.5 Flash Image) by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Google
| Model |
|---|
About Nano Banana (Gemini 2.5 Flash Image)
Google introduced Nano Banana (Gemini 2.5 Flash Image) (internally codenamed nano-banana) on March 20, 2025 as a direct response to developer feedback on the earlier native image generation in Gemini 2.0 Flash. Users valued 2.0 Flash's low latency and ease of use but requested higher image quality and more capable creative control.
Four capabilities define the model. First, character consistency: it can place the same character or object into different environments, generate a product from multiple angles in new settings, or produce consistent brand assets across a series of prompts while preserving subject appearance. Second, prompt-based image editing: you use natural language to perform targeted local edits (blurring a background, removing a stain, altering a pose, colorizing a black-and-white photo) in a single call. Third, native world knowledge: unlike prior image generation models that excelled at aesthetics but lacked semantic grounding, Nano Banana (Gemini 2.5 Flash Image) draws on Gemini's world knowledge to interpret hand-drawn diagrams, answer questions grounded in real-world understanding, and follow complex editing instructions in one step. Fourth, multi-image fusion: the model accepts multiple input images and merges them, enabling product placement into new scenes, room restyling from a reference texture, and image-to-image blending.
All images created or edited with Nano Banana (Gemini 2.5 Flash Image) include an invisible SynthID digital watermark for downstream identification of AI-generated or AI-edited content.
What To Consider When Choosing a Provider
- Configuration: Image generation from this model is priced per image: N/A, with each image counted as 1,290 output tokens per Google's billing. All other input and output modalities follow Gemini 2.5 Flash rates.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Nano Banana (Gemini 2.5 Flash Image)
Best For
- Visual storytelling and character consistency: Campaigns, comics, or narrative applications that require the same character to appear coherently across multiple distinct images
- Automated product photography: Generating a product catalog from a single reference image, showing items in multiple settings and angles at scale
- Prompt-driven photo editing: Building user-facing editing tools that accept natural language instructions to perform precise, targeted modifications to uploaded images
- Multi-image composition workflows: Fusing product images into lifestyle scenes, restyling interiors with reference textures, or merging source materials into a single photorealistic output
- Education and knowledge-grounded visuals: Generating diagrams, illustrations, or annotated visuals that require semantic understanding of real-world concepts rather than purely aesthetic generation
Consider Alternatives When
- Text-only output needed: Image generation would add unnecessary cost and complexity
- Video generation required: Still image output is not sufficient and a video model is the right fit
- Latency-sensitive text pipelines: A standard Gemini 2.5 Flash model is more appropriate for purely text workflows
- Embedding or retrieval workloads: A dedicated embedding model architecture is required
Conclusion
Nano Banana (Gemini 2.5 Flash Image) is a purpose-built native image generation model that advances beyond aesthetic generation by grounding output in Gemini's world knowledge, enabling use cases that depend on semantic accuracy, character consistency, and precise instruction-following at the pixel level. For teams building image-centric applications, editing tools, or automated creative pipelines, it delivers a unified model that handles generation and editing in a single API call.
Frequently Asked Questions
What is the per-image cost for Nano Banana (Gemini 2.5 Flash Image)?
Current pricing is shown on this page. AI Gateway routes across providers, and rates may vary by provider.
Can the model maintain a character's appearance across multiple generated images?
Yes. Character consistency is one of the four headline capabilities described. The model can reproduce the same character or object in different environments, angles, and settings while preserving visual identity.
What types of prompt-based edits can the model perform?
You can perform edits like blurring backgrounds, removing subjects from scenes, adding or replacing elements, changing colors, and applying style transfers. The model handles these through natural language prompts combined with input images.
How does multi-image fusion work?
The model accepts multiple images as input and can merge them in a single prompt, for example, placing a product into a new scene, restyling a room using a reference texture or color scheme, or blending two source images together.
What is SynthID and are outputs watermarked?
SynthID is Google's invisible digital watermark technology. All images created or edited with Nano Banana (Gemini 2.5 Flash Image) include a SynthID watermark that allows them to be identified as AI-generated or AI-edited.
What makes this model's world knowledge capability distinct from prior image generation models?
Previous image generation models excelled at aesthetics but lacked deep semantic understanding. Nano Banana (Gemini 2.5 Flash Image) draws on Gemini's world knowledge to interpret hand-drawn diagrams, reason about real-world questions, and follow complex multi-step editing instructions in a single generation step.
What are the known limitations?
Known limitations at preview launch include long-form text rendering within images, character consistency reliability, and factual accuracy of fine image details. Google is actively improving these areas.