MiniMax M2.1 Lightning

MiniMax M2.1 Lightning is the throughput-optimized variant of MiniMax-M2.1. It supports a context window of 204.8K tokens and a max output of 131.1K tokens per request.

ReasoningTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'minimax/minimax-m2.1-lightning',
  prompt: 'Why is the sky blue?'
})

Overview About Providers Throughput Latency Uptime Status Similar FAQ

About MiniMax M2.1 Lightning

MiniMax M2.1 Lightning shipped alongside M2.1 as its speed-optimized companion. The Lightning variant delivers faster inference while maintaining identical outputs to standard M2.1. You don't trade quality for throughput.

The model supports the same programming languages as M2.1: Go, C++, JavaScript, C#, TypeScript, Rust, Java, Kotlin, and Objective-C. It also carries the same Interleaved Thinking capability and agentic tool-use support that define the 2.1 generation.

Automatic prompt caching is built in with no manual configuration. This further reduces effective latency for repeated or structurally similar prompts. MiniMax M2.1 Lightning suits developer tools, IDE assistants, and any application where users expect near-instantaneous code suggestions.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

MiniMax M2.1 Lightning

About MiniMax M2.1 Lightning