Skip to content

MiniMax M2.5 High Speed

MiniMax M2.5 High Speed is the throughput-optimized variant that retains M2.5's full planning and software engineering capabilities.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'minimax/minimax-m2.5-highspeed',
prompt: 'Why is the sky blue?'
})

Playground

Try out MiniMax M2.5 High Speed by MiniMax. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About MiniMax M2.5 High Speed

MiniMax M2.5 High Speed targets autonomous coding agents that run for extended periods and need fast token generation. See live metrics on this page for current throughput. For cost estimates, use and your expected token volumes rather than a fixed hourly figure.

The "highspeed" label doesn't indicate a distilled or reduced-capability model. MiniMax M2.5 High Speed retains the full architectural planning mode of standard M2.5. It decomposes problems into specifications before writing code, handles the complete development lifecycle across Web, Android, iOS, Windows, and Mac platforms, and matches the same reported SWE-Bench Verified score as standard M2.5.

The tradeoff is straightforward: you pay roughly twice as much per token in exchange for generating tokens roughly twice as fast on paper. For batch jobs where wall-clock time doesn't matter, standard M2.5 is more economical. For interactive sessions, streaming UIs, and agent loops where latency compounds, the highspeed variant can win.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
MiniMax
Legal:Terms
Privacy
205K
1.1s
53tps
$0.60/M$2.40/M
Read:$0.03/M
Write:$0.38/M
02/12/2026
Novita AI
Legal:Terms
Privacy
205K
1.7s
74tps
$0.60/M$2.40/M
Read:$0.03/M
Write:
02/12/2026
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by MiniMax

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
205K
0.6s
178tps
$0.30/M$1.20/M
Read:$0.06/M
Write:$0.38/M
fireworks logo
minimax logo
novita logo
+1
03/18/2026
205K
1.2s
55tps
$0.60/M$2.40/M
Read:$0.06/M
Write:$0.38/M
minimax logo
03/18/2026
1M
0.4s
257tps
$0.27/M$0.95/M
Read:$0.03/M
Write:$0.38/M
bedrock logo
deepinfra logo
minimax logo
+2
02/12/2026
205K
0.4s
334tps
$0.30/M$1.20/M
Read:$0.03/M
Write:$0.38/M
bedrock logo
minimax logo
novita logo
10/27/2025
205K
0.7s
76tps
$0.30/M$1.20/M
Read:$0.03/M
Write:$0.38/M
minimax logo
novita logo
10/27/2025
205K
1.1s
57tps
$0.30/M$2.40/M
Read:$0.03/M
Write:$0.38/M
minimax logo
10/27/2025

What To Consider When Choosing a Provider

  • Configuration: The highspeed variant lists at roughly double the standard M2.5 input and output rates on many providers. AI Gateway's per-request cost tracking lets you measure whether the throughput gain pays for itself in your workload before you commit at scale.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use MiniMax M2.5 High Speed

Best For

  • Long-running coding agents: Autonomous sessions that span multi-minute or multi-hour workflows
  • Streaming pair-programming: Chat interfaces where token delivery speed is user-visible
  • High-concurrency services: Per-request latency accumulates across simultaneous users
  • Wall-clock time priority: Halving inference time justifies a 2x per-token cost increase

Consider Alternatives When

  • Fast tasks anyway: Your tasks complete in seconds regardless of throughput, so the speed premium has no practical impact
  • Cost over latency: Per-token cost is a harder constraint than latency and standard M2.5 delivers the same output for less
  • Multi-agent coordination: You need the features introduced in M2.7

Conclusion

MiniMax M2.5 High Speed occupies a clear niche: same model, faster output, higher price. It's the right pick when your agent or application is bottlenecked on token generation speed and the cost difference is justified by reduced wall-clock time or improved user experience.

Frequently Asked Questions

  • Is there any quality difference between MiniMax M2.5 High Speed and standard M2.5?

    None. Both variants share the same architecture, planning capabilities, and benchmark scores. The "highspeed" designation reflects inference throughput only.

  • What development tasks does MiniMax M2.5 High Speed cover?

    MiniMax M2.5 High Speed handles the full development lifecycle: specification writing, code generation, debugging, and deployment across Web, Android, iOS, Windows, and Mac platforms.

  • How does the pricing compare to standard M2.5?

    Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves MiniMax M2.5 High Speed.

  • What throughput can I expect?

    Live throughput metrics display on this page and update based on real traffic.

  • Should I use MiniMax M2.5 High Speed or M2.7 Highspeed?

    If you don't need M2.7's multi-agent orchestration or dynamic tool search, MiniMax M2.5 High Speed is more cost-effective at comparable speeds.

  • Can I try this model before integrating it?

    Yes. Open https://ai-sdk.dev/playground/minimax:minimax-m2.5-highspeed to evaluate output quality and perceived speed interactively.