FLUX.2 [klein] 9B sits at the quality-focused end of Black Forest Labs's Klein family, released on N/A. Black Forest Labs describes it as setting the Pareto frontier for quality versus latency across text-to-image, single-reference editing, and multi-reference generation.
The architecture starts with the text encoder. Klein 9B pairs the flow model with an 8B Qwen3 embedder, a large-language-model-class text encoder that brings substantially better prompt understanding than smaller CLIP-based encoders. This Qwen3 integration is the primary reason Klein 9B understands complex, multi-clause prompts more accurately than competing small models.
The 9B flow model is step-distilled to four inference steps. Step distillation compresses the full diffusion trajectory into a small number of high-quality denoising passes. The result is a model that generates images in under half a second while retaining the quality of a much larger, slower undistilled counterpart.
Klein 9B handles text-to-image synthesis, image editing with a single reference, and multi-reference generation (combining multiple input images to blend concepts) within the same weights.