Released October 1, 2024, Ministral 8B sits between the 3B and 14B models in Mistral AI's edge lineup. What sets Ministral 8B apart is its architecture: an interleaved sliding-window attention mechanism engineered for inference speed and memory efficiency.
Standard full-attention transformers require every token to attend to every other token, scaling quadratically with sequence length. Sliding-window attention limits each token's attention span, cutting memory usage. The interleaved design alternates between full-attention and windowed layers, preserving the ability to reason over long-range dependencies while keeping the memory footprint practical.
Ministral 8B uses its full context window of 128K tokens and supports function calling, knowledge retrieval, and commonsense reasoning.
Ministral 8B carries dual licensing: the Mistral AI Commercial License for production and the Mistral AI Research License for non-commercial work. This offers more flexibility than the 3B variant.