Skip to content

MiniMax M2.1 Lightning

MiniMax M2.1 Lightning is the throughput-optimized variant of MiniMax-M2.1. It supports a context window of 204.8K tokens and a max output of 131.1K tokens per request.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'minimax/minimax-m2.1-lightning',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • Does MiniMax M2.1 Lightning produce different outputs than standard M2.1?

    No. MiniMax M2.1 Lightning produces identical outputs to standard M2.1. Only inference speed differs.

  • How much faster is MiniMax M2.1 Lightning compared to M2.1?

    Lightning is the throughput-optimized variant, built to outperform M2 on output speed. See live metrics on this page for current AI Gateway measurements.

  • Does automatic prompt caching apply to all requests?

    Yes. Prompt caching applies automatically with no manual configuration. It reduces latency for prompts with repeated context.

  • Is MiniMax M2.1 Lightning more expensive than M2.1?

    Yes, typically. Expect about $0.3 per million input tokens and $2.4 per million output tokens for this variant (compare to standard M2.1 on the same page).

  • What programming languages does MiniMax M2.1 Lightning support?

    The same languages as M2.1: Go, C++, JavaScript, C#, TypeScript, Rust, Java, Kotlin, and Objective-C.

  • Can I use MiniMax M2.1 Lightning for agentic workflows with tool calls?

    Yes. MiniMax M2.1 Lightning retains all of M2.1's agentic capabilities, including tool use, multi-step reasoning, and Interleaved Thinking.

  • How do I switch from M2.1 to MiniMax M2.1 Lightning in the AI SDK?

    Change the model identifier to minimax/minimax-m2.1-lightning. No other code changes are needed.