Ministral 8B
Ministral 8B brings an interleaved sliding-window attention architecture to edge inference, delivering faster and more memory-efficient processing across its full context window of 128K tokens at $0.15 per million tokens.
import { streamText } from 'ai'
const result = streamText({ model: 'mistral/ministral-8b', prompt: 'Why is the sky blue?'})Frequently Asked Questions
What exactly is interleaved sliding-window attention?
It alternates between layers that use full attention (every token sees every other token) and layers that use windowed attention (each token only sees nearby tokens). This combination preserves long-range reasoning while dramatically reducing memory consumption.
How does Ministral 8B compare to standard 8B transformers on inference speed?
Mistral AI designed the sliding-window pattern specifically to outperform standard architectures on speed and memory at this scale. The exact advantage depends on hardware and serving stack, but the architectural benefit is most pronounced on long sequences.
What licensing does Ministral 8B offer that the 3B does not?
Ministral 8B includes both the Mistral AI Commercial License and the Mistral AI Research License. The 3B variant's licensing is more limited.
What is the context window of Ministral 8B?
128K tokens.
When should I choose Ministral 8B over scaling up to a larger model like Mistral AI Small?
Choose Ministral 8B when you need the memory efficiency that sliding-window attention provides. For workloads that benefit from more capability, a larger model may be more cost-effective per unit of capability.