Skip to content

Qwen 3 Max Thinking

Qwen 3 Max Thinking is Alibaba's trillion-parameter reasoning model that autonomously deploys built-in search, memory, and code interpreter tools during inference, achieving a score of 49.8 on Humanity's Last Exam with search enabled.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'alibaba/qwen3-max-thinking',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • What tools does Qwen 3 Max Thinking invoke autonomously?

    The model has three integrated tools: Search (for retrieving current information), Memory (for tracking user preferences and context across turns), and Code Interpreter (for executing code to verify or compute results). It selects among them without user prompting.

  • How does the thinking mode differ from standard completion mode?

    In thinking mode, the model generates an extended internal reasoning trace before producing its final answer. This trace is visible in the response and reflects multi-step problem decomposition, which improves accuracy on hard reasoning tasks at the cost of more output tokens.

  • What did the model score on Humanity's Last Exam?

    With its Search tool enabled, Qwen 3 Max Thinking scored 49.8 on Humanity's Last Exam, a benchmark of ~3,000 graduate-level questions across math, science, and engineering.

  • Which SDKs are compatible with this model via AI Gateway?

    You can access Qwen 3 Max Thinking through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats supported by AI Gateway.

  • Does autonomous tool use mean I can't control when tools fire?

    You can't manually control which tools fire. The model selects tools as an internal reasoning step. If your application requires deterministic, manually orchestrated tool calls, the non-thinking Qwen3-Max variant with explicit function-calling schemas is a better fit.

  • How does the Search tool reduce hallucinations?

    When Qwen 3 Max Thinking's internal confidence on a factual claim is low, it automatically triggers a search query to retrieve current information rather than generating an answer from its internal knowledge alone, which reduces the frequency of confidently stated incorrect facts.

  • What is the token cost model for thinking-mode responses?

    Thinking-mode completions generate reasoning traces that can be substantially longer than direct answers. Output tokens from both the reasoning chain and the final response are billed.