FLUX.2 [pro] is Black Forest Labs's FLUX.2 generation launch model, released on N/A. The FLUX.2 architecture departs from FLUX.1 by pairing a rectified-flow transformer with a Mistral-3 24B vision-language model (VLM) as the text conditioning component, retrained with a new VAE. The VLM handles semantic understanding of complex prompts and reference inputs. The flow transformer handles image synthesis. Together they enable FLUX.2's improved reference image handling.
The multi-reference capability accepts up to 10 input images to guide generation simultaneously. In practice, you can supply a product photo, a lighting reference, a composition reference, and a style example together. The model synthesizes all of them coherently into a new image. Exact color matching (producing output that hits specific hex values or matches reference palette samples) directly benefits brand asset production where color accuracy is non-negotiable.
At up to 4 megapixels output resolution, FLUX.2 [pro] is the entry point to the FLUX.2 generation for teams whose workflows need the multi-reference architecture. FLUX.2 Max extends the quality ceiling further. FLUX.2 [pro] is the generation baseline that introduced the paradigm.