What fundamentally changed between the 2.5 and 2.7 generations?

Three capabilities that didn't exist in M2.5: native multi-agent coordination (no external orchestration code), dynamic tool search (tools found at runtime rather than declared upfront), and enterprise office automation (document analysis, structured data, reporting).

How does runtime tool discovery work?

Instead of receiving a fixed tool manifest in the prompt, MiniMax M2.7 High Speed evaluates the evolving task state and identifies relevant tools. It invokes them without prior declaration, expanding the model's effective action space over long sessions.

Does switching from the previous highspeed variant require code changes?

Only the model identifier string. Update to `minimax/minimax-m2.7-highspeed` in your API calls. The tool-calling format, API surface, and AI Gateway configuration stay the same.

Can MiniMax M2.7 High Speed coordinate agents built on different model families?

Yes. The orchestration logic is native to the M2.7 architecture, but the agents it coordinates can run any model. Coordination fidelity is strongest when MiniMax M2.7 High Speed serves as the orchestrating agent.

Is throughput the same as the prior highspeed generation?

Both target comparable throughput (see live metrics on this page). The improvement is capability breadth per token, not token velocity.

When does the standard-rate M2.7 make more sense?

When nobody is waiting on the output. Background batch processing, scheduled overnight jobs, and any pipeline where wall-clock duration doesn't affect user experience or business outcomes.

MiniMax M2.7 High Speed

MiniMax M2.7 High Speed is the throughput-optimized variant of M2.7. It supports a context window of 204.8K tokens and a max output of 131.1K tokens.

ReasoningTool UseImplicit CachingVision (Image)fille-input

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'minimax/minimax-m2.7-highspeed',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out MiniMax M2.7 High Speed by MiniMax. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

205K

0.9s

52tps

$0.60/M

$2.40/M

Read:$0.06/M

Write:$0.38/M

—

03/18/2026

More models by MiniMax

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

205K

0.4s

117tps

$0.30/M

$1.20/M

Read:$0.06/M

Write:$0.38/M

—

03/18/2026

0.5s

168tps

$0.27/M

$0.95/M

Read:$0.03/M

Write:$0.38/M

—

02/12/2026

205K

1.0s

50tps

$0.60/M

$2.40/M

Read:$0.03/M

Write:$0.38/M

—

02/12/2026

205K

0.4s

231tps

$0.30/M

$1.20/M

Read:$0.03/M

Write:$0.38/M

—

10/27/2025

205K

1.0s

59tps

$0.30/M

$1.20/M

Read:$0.03/M

Write:$0.38/M

—

10/27/2025

205K

0.9s

51tps

$0.30/M

$2.40/M

Read:$0.03/M

Write:$0.38/M

—

10/27/2025

About MiniMax M2.7 High Speed

Three architectural capabilities separate the 2.7 generation from earlier MiniMax releases. MiniMax M2.7 High Speed delivers all three at accelerated inference.

1. Agent-to-agent orchestration without middleware. Earlier MiniMax models operated as isolated workers. Coordinating them required external scaffolding: custom code to pass context, manage handoffs, and track dependencies. MiniMax M2.7 High Speed internalizes that orchestration layer. It manages context propagation, dependency resolution, and agent handoffs natively. In parallel architectures, compressing per-agent token generation shortens the critical path.

2. Runtime tool discovery. Prior generations consumed a static tool manifest declared at prompt time. MiniMax M2.7 High Speed breaks that constraint: it identifies, evaluates, and invokes tools dynamically as a task unfolds. For long-horizon automation where required actions can't be predicted upfront, this reduces the need to pre-enumerate every tool interaction.

3. Enterprise document processing. Structured data extraction, report synthesis, spreadsheet analysis, and document transformation join the capability set. A single endpoint now serves both engineering automation and business-process work, reducing the number of specialized models you manage.

Throughput remains high (see live metrics on this page). The generational leap is in what the model accomplishes per token, not how many tokens it produces.

What To Consider When Choosing a Provider

Configuration: MiniMax M2.7 High Speed lists at roughly 2x the standard M2.7 input and output rates on many providers. AI Gateway's per-request cost tracking helps you quantify whether the throughput gain justifies the expense for your workload.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use MiniMax M2.7 High Speed

Best For

Parallel agent architectures: Per-agent token velocity directly compresses end-to-end task completion
Autonomous tool discovery: Workflows that must locate and invoke unfamiliar tools as subtasks emerge during execution
Unified engineering and business: Pipelines that need code generation and document processing from one endpoint
Native orchestration replacement: Organizations replacing external middleware with a model that coordinates agents natively

Consider Alternatives When

Independent agents: Your agents never exchange context, so an earlier highspeed variant handles isolated coding at lower cost
Batch jobs without pressure: Standard M2.7 produces identical results at the baseline rate
Budget ceiling exceeded: The 2x per-token premium exceeds your budget regardless of latency benefit

Conclusion

MiniMax M2.7 High Speed adds agent orchestration, runtime tool discovery, and enterprise document work while sustaining the throughput that makes long-running, multi-agent sessions viable. It pairs the full M2.7 capability set with high-throughput inference for teams whose workloads have outgrown single-agent patterns.

Frequently Asked Questions

What fundamentally changed between the 2.5 and 2.7 generations?
Three capabilities that didn't exist in M2.5: native multi-agent coordination (no external orchestration code), dynamic tool search (tools found at runtime rather than declared upfront), and enterprise office automation (document analysis, structured data, reporting).
How does runtime tool discovery work?
Instead of receiving a fixed tool manifest in the prompt, MiniMax M2.7 High Speed evaluates the evolving task state and identifies relevant tools. It invokes them without prior declaration, expanding the model's effective action space over long sessions.
Does switching from the previous highspeed variant require code changes?
Only the model identifier string. Update to minimax/minimax-m2.7-highspeed in your API calls. The tool-calling format, API surface, and AI Gateway configuration stay the same.
Can MiniMax M2.7 High Speed coordinate agents built on different model families?
Yes. The orchestration logic is native to the M2.7 architecture, but the agents it coordinates can run any model. Coordination fidelity is strongest when MiniMax M2.7 High Speed serves as the orchestrating agent.
Is throughput the same as the prior highspeed generation?
Both target comparable throughput (see live metrics on this page). The improvement is capability breadth per token, not token velocity.
When does the standard-rate M2.7 make more sense?
When nobody is waiting on the output. Background batch processing, scheduled overnight jobs, and any pipeline where wall-clock duration doesn't affect user experience or business outcomes.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

MiniMax M2.7 High Speed

Playground

Providers

More models by MiniMax

About MiniMax M2.7 High Speed

What To Consider When Choosing a Provider

When to Use MiniMax M2.7 High Speed

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions