Google released Gemini 2.0 Flash on December 11, 2024 as the first model in the Gemini 2.0 generation, optimized for high-volume, high-frequency tasks at scale. It outperforms Gemini 1.5 Pro on key benchmarks at twice the speed, a significant capability leap over the previous Flash generation.
Gemini 2.0 Flash stands out through native multimodal output. Beyond accepting text, images, video, and audio as input, it produces natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. This eliminates the need for separate image-generation or speech-synthesis calls, enabling tighter integration within a single request.
Google simultaneously released the Multimodal Live API alongside 2.0 Flash for real-time and interactive applications. This API adds streaming real-time audio and video input, combined tool use, and the low-latency response characteristics conversational agents and live-session experiences need. Gemini 2.0 Flash also supports native tool use including Google Search, code execution, and user-defined functions for multi-step agentic workflows.
The context window of 1.0M tokens handles tasks that require reasoning over large codebases, lengthy documents, or extended conversation histories in a single pass.