Ministral 8B
Ministral 8B brings an interleaved sliding-window attention architecture to edge inference, delivering faster and more memory-efficient processing across its full context window of 128K tokens at $0.15 per million tokens.
import { streamText } from 'ai'
const result = streamText({ model: 'mistral/ministral-8b', prompt: 'Why is the sky blue?'})Playground
Try out Ministral 8B by Mistral AI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Mistral AI
| Model |
|---|
About Ministral 8B
Released October 1, 2024, Ministral 8B sits between the 3B and 14B models in Mistral AI's edge lineup. What sets Ministral 8B apart is its architecture: an interleaved sliding-window attention mechanism engineered for inference speed and memory efficiency.
Standard full-attention transformers require every token to attend to every other token, scaling quadratically with sequence length. Sliding-window attention limits each token's attention span, cutting memory usage. The interleaved design alternates between full-attention and windowed layers, preserving the ability to reason over long-range dependencies while keeping the memory footprint practical.
Ministral 8B uses its full context window of 128K tokens and supports function calling, knowledge retrieval, and commonsense reasoning.
Ministral 8B carries dual licensing: the Mistral AI Commercial License for production and the Mistral AI Research License for non-commercial work. This offers more flexibility than the 3B variant.
What To Consider When Choosing a Provider
- Configuration: For workloads processing long documents or extended conversation histories, Ministral 8B's sliding-window architecture reduces the memory pressure typical of long-context inference.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Ministral 8B
Best For
- Long-context processing: Sliding-window attention keeps memory footprint manageable when processing long inputs
- Deeper reasoning than 3B: Tasks requiring more depth than Ministral 3B can provide
- Function calling and tool use: With better accuracy than the 3B variant
- Dual licensing research use cases: Covered by the Commercial and Research licenses
Consider Alternatives When
- Smallest footprint and lowest cost: You need the absolute minimum (consider Ministral 3B)
- Image understanding: Vision is required (consider Ministral 14B)
Conclusion
Ministral 8B earns its place through architectural innovation rather than just parameter scaling. The sliding-window attention design makes long-context inference more memory-efficient than standard transformers at this size.
Frequently Asked Questions
What exactly is interleaved sliding-window attention?
It alternates between layers that use full attention (every token sees every other token) and layers that use windowed attention (each token only sees nearby tokens). This combination preserves long-range reasoning while dramatically reducing memory consumption.
How does Ministral 8B compare to standard 8B transformers on inference speed?
Mistral AI designed the sliding-window pattern specifically to outperform standard architectures on speed and memory at this scale. The exact advantage depends on hardware and serving stack, but the architectural benefit is most pronounced on long sequences.
What licensing does Ministral 8B offer that the 3B does not?
Ministral 8B includes both the Mistral AI Commercial License and the Mistral AI Research License. The 3B variant's licensing is more limited.
What is the context window of Ministral 8B?
128K tokens.
When should I choose Ministral 8B over scaling up to a larger model like Mistral AI Small?
Choose Ministral 8B when you need the memory efficiency that sliding-window attention provides. For workloads that benefit from more capability, a larger model may be more cost-effective per unit of capability.