Qwen 3 Max Thinking
Qwen 3 Max Thinking is Alibaba's trillion-parameter reasoning model that autonomously deploys built-in search, memory, and code interpreter tools during inference, achieving a score of 49.8 on Humanity's Last Exam with search enabled.
import { streamText } from 'ai'
const result = streamText({ model: 'alibaba/qwen3-max-thinking', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
Thinking-mode responses can be substantially longer than standard completions, factor in output token volume when estimating per-request cost across providers.
When to Use Qwen 3 Max Thinking
Best For
STEM and engineering questions:
Hard scientific, mathematical, or engineering problems that benefit from visible chain-of-thought reasoning
Search-grounded research assistance:
Workflows where real-time search reduces the risk of stale or invented information
Self-correcting numerical agents:
Automated agents that route calculations through a code interpreter to verify outputs
Multi-turn personalized conversations:
Extended sessions where the model must remember and adapt to user preferences
Autonomous tool orchestration:
Complex pipelines where the model decides which tool to invoke without explicit user prompting
Consider Alternatives When
Tight token budgets:
Thinking-mode responses consume substantially more output tokens than direct-answer models
Low-latency requirements:
Fast completions matter more than extended reasoning, which becomes unnecessary overhead
Deterministic tool control:
Autonomous tool selection may not align with strictly orchestrated pipelines that need manual control
Creative or conversational tasks:
Step-by-step reasoning adds friction rather than value in these contexts
Conclusion
Qwen 3 Max Thinking is built for situations where getting the right answer matters more than getting it quickly: graduate-level problem solving, real-time research synthesis, and agentic pipelines that need a model capable of recognizing its own knowledge gaps. The combination of autonomous tool orchestration and transparent reasoning chains fits complex, open-ended inference tasks.
FAQ
The model has three integrated tools: Search (for retrieving current information), Memory (for tracking user preferences and context across turns), and Code Interpreter (for executing code to verify or compute results). It selects among them without user prompting.
In thinking mode, the model generates an extended internal reasoning trace before producing its final answer. This trace is visible in the response and reflects multi-step problem decomposition, which improves accuracy on hard reasoning tasks at the cost of more output tokens.
With its Search tool enabled, Qwen 3 Max Thinking scored 49.8 on Humanity's Last Exam, a benchmark of ~3,000 graduate-level questions across math, science, and engineering.
You can access Qwen 3 Max Thinking through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats supported by AI Gateway.
You can't manually control which tools fire. The model selects tools as an internal reasoning step. If your application requires deterministic, manually orchestrated tool calls, the non-thinking Qwen3-Max variant with explicit function-calling schemas is a better fit.
When Qwen 3 Max Thinking's internal confidence on a factual claim is low, it automatically triggers a search query to retrieve current information rather than generating an answer from its internal knowledge alone, which reduces the frequency of confidently stated incorrect facts.
Thinking-mode completions generate reasoning traces that can be substantially longer than direct answers. Output tokens from both the reasoning chain and the final response are billed.