FLUX.2 [flex] belongs to the FLUX.2 family from Black Forest Labs. Other FLUX.2 variants optimize for a fixed quality level. Flex gives you direct control over the number of inference steps and the guidance scale.
The FLUX.2 architecture pairs a Mistral-3 24B vision-language model (VLM) with a rectified flow transformer retrained on a new variational autoencoder (VAE). The VLM contributes real-world knowledge and contextual understanding, producing more coherent scenes, correct spatial logic, and reliable typography. FLUX.2 supports multi-reference input with up to 10 reference images and image editing at resolutions up to 4 megapixels.
Flex's role in the lineup is explicit parameter control. If you build workflows that span multiple quality tiers (a quick draft view and a high-fidelity export, for example), you can use one Flex endpoint and change the steps value instead of switching models. That makes Flex a natural fit for tools, pipelines, and interfaces that expose a quality dial to end users.