Skip to content

Ministral 8B

Ministral 8B brings an interleaved sliding-window attention architecture to edge inference, delivering faster and more memory-efficient processing across its full context window of 128K tokens at $0.15 per million tokens.

Tool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'mistral/ministral-8b',
prompt: 'Why is the sky blue?'
})

About Ministral 8B

Released October 1, 2024, Ministral 8B sits between the 3B and 14B models in Mistral AI's edge lineup. What sets Ministral 8B apart is its architecture: an interleaved sliding-window attention mechanism engineered for inference speed and memory efficiency.

Standard full-attention transformers require every token to attend to every other token, scaling quadratically with sequence length. Sliding-window attention limits each token's attention span, cutting memory usage. The interleaved design alternates between full-attention and windowed layers, preserving the ability to reason over long-range dependencies while keeping the memory footprint practical.

Ministral 8B uses its full context window of 128K tokens and supports function calling, knowledge retrieval, and commonsense reasoning.

Ministral 8B carries dual licensing: the Mistral AI Commercial License for production and the Mistral AI Research License for non-commercial work. This offers more flexibility than the 3B variant.