Skip to content
Dashboard

MiniMax M2.1 Lightning

MiniMax M2.1 Lightning is the throughput-optimized variant of MiniMax-M2.1. It supports a context window of 204.8K tokens and a max output of 131.1K tokens per request.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'minimax/minimax-m2.1-lightning',
prompt: 'Why is the sky blue?'
})

Playground

Try out MiniMax M2.1 Lightning by MiniMax. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

minimax logo
minimax logo

Ask MiniMax M2.1 Lightning anything to try it out.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
MiniMax
205K
1.0s
46tps
$0.30/M$2.40/M
Read:$0.03/M
Write:$0.38/M
——
+1
12/23/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by MiniMax

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
1.6s
190tps
$0.60/M$0.30/M
$2.40/M$1.20/M
Read:
$0.12/M$0.06/M
Write:
—
——
+2
fireworks logo
minimax logo
05/31/2026
205K
0.2s
188tps
$0.15/M$0.60/M
Read:$0.06/M
Write:$0.38/M
——
+1
blackbox logo
fireworks logo
minimax logo
+2
03/18/2026
205K
1.1s
42tps
$0.60/M$2.40/M
Read:$0.06/M
Write:$0.38/M
——
+1
minimax logo
03/18/2026
1M
0.5s
141tps
$0.07/M$0.57/M
Read:$0.03/M
Write:$0.38/M
——
+1
bedrock logo
blackbox logo
deepinfra logo
+3
02/12/2026
205K
0.9s
100tps
$0.30/M$1.20/M
Read:$0.03/M
Write:$0.38/M
——
+1
bedrock logo
minimax logo
novita logo
12/23/2025
205K
0.8s
62tps
$0.30/M$1.20/M
Read:$0.03/M
Write:$0.38/M
——
+1
minimax logo
novita logo
10/27/2025

About MiniMax M2.1 Lightning

MiniMax M2.1 Lightning shipped alongside M2.1 as its speed-optimized companion. The Lightning variant delivers faster inference while maintaining identical outputs to standard M2.1. You don't trade quality for throughput.

The model supports the same programming languages as M2.1: Go, C++, JavaScript, C#, TypeScript, Rust, Java, Kotlin, and Objective-C. It also carries the same Interleaved Thinking capability and agentic tool-use support that define the 2.1 generation.

Automatic prompt caching is built in with no manual configuration. This further reduces effective latency for repeated or structurally similar prompts. MiniMax M2.1 Lightning suits developer tools, IDE assistants, and any application where users expect near-instantaneous code suggestions.

What To Consider When Choosing a Provider

  • Configuration: For streaming use cases where time-to-first-token matters most, MiniMax M2.1 Lightning's throughput advantage translates directly into a more responsive end-user experience.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use MiniMax M2.1 Lightning

Best For

  • Interactive developer tools: IDE plugins where response latency is user-visible
  • Real-time code completion: Inline suggestion features in web or IDE applications where latency is visible
  • High-throughput batch jobs: Faster tokens-per-second reduces job duration
  • Streaming user experiences: Applications that need low time-to-first-token
  • Drop-in speed upgrade: Teams already on M2.1 who want faster inference

Consider Alternatives When

  • Minimize cost: Throughput is not a constraint, so use standard M2.1
  • Architectural planning needed: Your tasks require the planning capabilities introduced in M2.5
  • Vision input required: M2.1 Lightning is text-only, so use a multimodal model when your workload includes image inputs

Conclusion

MiniMax M2.1 Lightning resolves the typical quality-vs-speed tradeoff by matching M2.1's output while running faster. It's a straightforward upgrade for any latency-sensitive application already using the 2.1 generation. Built-in prompt caching amplifies the speed benefit for repetitive context patterns.