MiniMax M2.7 High Speed
MiniMax M2.7 High Speed is the throughput-optimized variant of M2.7. It supports a context window of 204.8K tokens and a max output of 131.1K tokens.
import { streamText } from 'ai'
const result = streamText({ model: 'minimax/minimax-m2.7-highspeed', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
MiniMax M2.7 High Speed lists at roughly 2x the standard M2.7 input and output rates on many providers. AI Gateway's per-request cost tracking helps you quantify whether the throughput gain justifies the expense for your workload.
When to Use MiniMax M2.7 High Speed
Best For
Parallel agent architectures:
Per-agent token velocity directly compresses end-to-end task completion
Autonomous tool discovery:
Workflows that must locate and invoke unfamiliar tools as subtasks emerge during execution
Unified engineering and business:
Pipelines that need code generation and document processing from one endpoint
Native orchestration replacement:
Organizations replacing external middleware with a model that coordinates agents natively
Consider Alternatives When
Independent agents:
Your agents never exchange context, so an earlier highspeed variant handles isolated coding at lower cost
Batch jobs without pressure:
Standard M2.7 produces identical results at the baseline rate
Budget ceiling exceeded:
The 2x per-token premium exceeds your budget regardless of latency benefit
Conclusion
MiniMax M2.7 High Speed adds agent orchestration, runtime tool discovery, and enterprise document work while sustaining the throughput that makes long-running, multi-agent sessions viable. It pairs the full M2.7 capability set with high-throughput inference for teams whose workloads have outgrown single-agent patterns.
FAQ
Three capabilities that didn't exist in M2.5: native multi-agent coordination (no external orchestration code), dynamic tool search (tools found at runtime rather than declared upfront), and enterprise office automation (document analysis, structured data, reporting).
Instead of receiving a fixed tool manifest in the prompt, MiniMax M2.7 High Speed evaluates the evolving task state and identifies relevant tools. It invokes them without prior declaration, expanding the model's effective action space over long sessions.
Only the model identifier string. Update to minimax/minimax-m2.7-highspeed in your API calls. The tool-calling format, API surface, and AI Gateway configuration stay the same.
Yes. The orchestration logic is native to the M2.7 architecture, but the agents it coordinates can run any model. Coordination fidelity is strongest when MiniMax M2.7 High Speed serves as the orchestrating agent.
Both target comparable throughput (see live metrics on this page). The improvement is capability breadth per token, not token velocity.
When nobody is waiting on the output. Background batch processing, scheduled overnight jobs, and any pipeline where wall-clock duration doesn't affect user experience or business outcomes.