MiniMax M3

MiniMax M3 is MiniMax's first model with a 1M tokens context window and native multimodal input. It targets software engineering, terminal-based tool use, and agentic web browsing, with a max output of 1M tokens per request.

Implicit CachingReasoningTool UseVision (Image)

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'minimax/minimax-m3',
  prompt: 'Why is the sky blue?'
})

Overview About Providers Throughput Latency Uptime Status Similar FAQ

About MiniMax M3

MiniMax M3 is built around MiniMax Sparse Attention (MSA), an attention variant that splits the key-value cache into blocks and pre-filters which blocks contribute to each query. That design supports the 1M tokens context window without the quadratic compute scaling of full attention, and it lets MiniMax M3 keep prefill and decode efficient on long inputs.

Native multimodality is wired in from the start of training rather than bolted on later. MiniMax M3 accepts text, image, and video input and produces text output. The pretraining pipeline aligns visual and textual semantics directly, which carries over to multimodal coding tasks like analyzing a screenshot of a failing test and writing a patch, or reproducing a bug from a GitHub issue thread.

MiniMax M3 is positioned for software engineering, terminal-based tool use, and agentic web browsing. It scores 59.0% on SWE-Bench Pro and 70.06% on OSWorld-Verified for computer use. Automatic prompt caching is enabled by default, which reduces effective cost on repeated context patterns common in agent loops.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

MiniMax M3

About MiniMax M3