Skip to content

Ministral 8B

mistral/ministral-8b

Ministral 8B brings an interleaved sliding-window attention architecture to edge inference, delivering faster and more memory-efficient processing across its full context window of 128K tokens at $0.15 per million tokens.

Tool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'mistral/ministral-8b',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

For workloads processing long documents or extended conversation histories, Ministral 8B's sliding-window architecture reduces the memory pressure typical of long-context inference.

When to Use Ministral 8B

Best For

  • Long-context processing:

    Sliding-window attention keeps memory footprint manageable when processing long inputs

  • Deeper reasoning than 3B:

    Tasks requiring more depth than Ministral 3B can provide

  • Function calling and tool use:

    With better accuracy than the 3B variant

  • Dual licensing research use cases:

    Covered by the Commercial and Research licenses

Consider Alternatives When

  • Smallest footprint and lowest cost:

    You need the absolute minimum (consider Ministral 3B)

  • Image understanding:

    Vision is required (consider Ministral 14B)

Conclusion

Ministral 8B earns its place through architectural innovation rather than just parameter scaling. The sliding-window attention design makes long-context inference more memory-efficient than standard transformers at this size.

FAQ

It alternates between layers that use full attention (every token sees every other token) and layers that use windowed attention (each token only sees nearby tokens). This combination preserves long-range reasoning while dramatically reducing memory consumption.

Mistral AI designed the sliding-window pattern specifically to outperform standard architectures on speed and memory at this scale. The exact advantage depends on hardware and serving stack, but the architectural benefit is most pronounced on long sequences.

Ministral 8B includes both the Mistral AI Commercial License and the Mistral AI Research License. The 3B variant's licensing is more limited.

128K tokens.

Choose Ministral 8B when you need the memory efficiency that sliding-window attention provides. For workloads that benefit from more capability, a larger model may be more cost-effective per unit of capability.