Nano Banana (Gemini 2.5 Flash Image)
Nano Banana (Gemini 2.5 Flash Image) is Google's native image generation and editing model, combining multimodal world knowledge with character consistency, targeted prompt-based edits, and multi-image fusion in a single model.
import { generateText } from 'ai'
const result = await generateText({ model: 'google/gemini-2.5-flash-image', prompt: 'Render a picture of a red balloon.',});Frequently Asked Questions
What is the per-image cost for Nano Banana (Gemini 2.5 Flash Image)?
Current pricing is shown on this page. AI Gateway routes across providers, and rates may vary by provider.
Can the model maintain a character's appearance across multiple generated images?
Yes. Character consistency is one of the four headline capabilities described. The model can reproduce the same character or object in different environments, angles, and settings while preserving visual identity.
What types of prompt-based edits can the model perform?
You can perform edits like blurring backgrounds, removing subjects from scenes, adding or replacing elements, changing colors, and applying style transfers. The model handles these through natural language prompts combined with input images.
How does multi-image fusion work?
The model accepts multiple images as input and can merge them in a single prompt, for example, placing a product into a new scene, restyling a room using a reference texture or color scheme, or blending two source images together.
What is SynthID and are outputs watermarked?
SynthID is Google's invisible digital watermark technology. All images created or edited with Nano Banana (Gemini 2.5 Flash Image) include a SynthID watermark that allows them to be identified as AI-generated or AI-edited.
What makes this model's world knowledge capability distinct from prior image generation models?
Previous image generation models excelled at aesthetics but lacked deep semantic understanding. Nano Banana (Gemini 2.5 Flash Image) draws on Gemini's world knowledge to interpret hand-drawn diagrams, reason about real-world questions, and follow complex multi-step editing instructions in a single generation step.
What are the known limitations?
Known limitations at preview launch include long-form text rendering within images, character consistency reliability, and factual accuracy of fine image details. Google is actively improving these areas.