Skip to content

Qwen3 Max

Qwen3 Max is Alibaba's trillion-parameter MoE language model with a context window of 262.1K tokens, delivering competitive performance on coding, mathematics, and enterprise tool-use tasks.

Tool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'alibaba/qwen3-max',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • How many parameters does Qwen3 Max have?

    The model exceeds one trillion total parameters. It's served as a closed-weight API, and model weights aren't available for download.

  • What is the context window for Qwen3 Max?

    The context window is 262.1K tokens. This supports long document analysis and extended multi-turn sessions.

  • How does Qwen3 Max handle context caching?

    The model supports context caching, allowing repeated long prompts, such as a large system prompt or document, to be processed once and reused across many requests, reducing latency and cost.

  • What is the difference between Qwen3 Max and Qwen3-Max-Thinking?

    Qwen3 Max is optimized for fast, high-quality responses without extended internal reasoning traces. Qwen3-Max-Thinking adds a dedicated thinking mode where the model works through complex problems step by step, making it better suited to hard math, competitive coding, and scientific reasoning at the cost of higher token usage.

  • Does Qwen3 Max support function calling?

    Yes. Qwen3 Max was specifically evaluated on tool-use benchmarks (Tau2-Bench: 74.8) and is designed for multi-step agentic workflows involving structured API calls.

  • Can Qwen3 Max generate outputs in both Chinese and English?

    Yes. Alibaba positions Qwen3 Max with strong native support for both Chinese and English, alongside broad multilingual capability.

  • How does Qwen3 Max score on coding benchmarks?

    On SWE-bench Verified, Qwen3 Max recorded a score of 69.6, placing it competitively among other models evaluated on software engineering tasks.