Ministral 8B
Ministral 8B brings an interleaved sliding-window attention architecture to edge inference, delivering faster and more memory-efficient processing across its full context window of 128K tokens at $0.15 per million tokens.
import { streamText } from 'ai'
const result = streamText({ model: 'mistral/ministral-8b', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
For workloads processing long documents or extended conversation histories, Ministral 8B's sliding-window architecture reduces the memory pressure typical of long-context inference.
When to Use Ministral 8B
Best For
Long-context processing:
Sliding-window attention keeps memory footprint manageable when processing long inputs
Deeper reasoning than 3B:
Tasks requiring more depth than Ministral 3B can provide
Function calling and tool use:
With better accuracy than the 3B variant
Dual licensing research use cases:
Covered by the Commercial and Research licenses
Consider Alternatives When
Smallest footprint and lowest cost:
You need the absolute minimum (consider Ministral 3B)
Image understanding:
Vision is required (consider Ministral 14B)
Conclusion
Ministral 8B earns its place through architectural innovation rather than just parameter scaling. The sliding-window attention design makes long-context inference more memory-efficient than standard transformers at this size.
FAQ
It alternates between layers that use full attention (every token sees every other token) and layers that use windowed attention (each token only sees nearby tokens). This combination preserves long-range reasoning while dramatically reducing memory consumption.
Mistral AI designed the sliding-window pattern specifically to outperform standard architectures on speed and memory at this scale. The exact advantage depends on hardware and serving stack, but the architectural benefit is most pronounced on long sequences.
Ministral 8B includes both the Mistral AI Commercial License and the Mistral AI Research License. The 3B variant's licensing is more limited.
128K tokens.
Choose Ministral 8B when you need the memory efficiency that sliding-window attention provides. For workloads that benefit from more capability, a larger model may be more cost-effective per unit of capability.