Qwen 3 Max Thinking
Qwen 3 Max Thinking is Alibaba's trillion-parameter reasoning model that autonomously deploys built-in search, memory, and code interpreter tools during inference, achieving a score of 49.8 on Humanity's Last Exam with search enabled.
import { streamText } from 'ai'
const result = streamText({ model: 'alibaba/qwen3-max-thinking', prompt: 'Why is the sky blue?'})Playground
Try out Qwen 3 Max Thinking by Alibaba. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
About Qwen 3 Max Thinking
Qwen 3 Max Thinking, released on N/A, extends the Qwen3-Max architecture with a dedicated extended reasoning mode and integrated autonomous tool use. When Qwen 3 Max Thinking encounters a question that exceeds its internal knowledge or requires computation, it independently decides whether to trigger its Search tool (for current information), Memory tool (for cross-turn context persistence), or Code Interpreter (for numerical verification and data processing), without you needing to specify which tool applies.
This autonomous tool selection is a meaningful architectural distinction. Rather than exposing tool invocation as an explicit user-facing control, Qwen 3 Max Thinking treats it as an internal reasoning step, making the interaction feel more like working with a capable assistant that knows when to check its work. The design is intended to reduce hallucination risk on factual queries by defaulting to retrieval when confidence is low, and to improve numerical accuracy by routing computations through an interpreter.
Qwen 3 Max Thinking's thinking mode exposes its reasoning chain before delivering a final answer, providing transparency into multi-step problem decomposition. On Humanity's Last Exam, a benchmark of approximately 3,000 graduate-level questions spanning mathematics, science, and engineering, Qwen 3 Max Thinking with search enabled scored 49.8, competitive with other models on the same benchmark in Alibaba's published comparisons. You can access Qwen 3 Max Thinking through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats supported by AI Gateway.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Alibaba
| Model |
|---|
What To Consider When Choosing a Provider
- Configuration: Thinking-mode responses can be substantially longer than standard completions, factor in output token volume when estimating per-request cost across providers.
- Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Qwen 3 Max Thinking
Best For
- STEM and engineering questions: Hard scientific, mathematical, or engineering problems that benefit from visible chain-of-thought reasoning
- Search-grounded research assistance: Workflows where real-time search reduces the risk of stale or invented information
- Self-correcting numerical agents: Automated agents that route calculations through a code interpreter to verify outputs
- Multi-turn personalized conversations: Extended sessions where the model must remember and adapt to user preferences
- Autonomous tool orchestration: Complex pipelines where the model decides which tool to invoke without explicit user prompting
Consider Alternatives When
- Tight token budgets: Thinking-mode responses consume substantially more output tokens than direct-answer models
- Low-latency requirements: Fast completions matter more than extended reasoning, which becomes unnecessary overhead
- Deterministic tool control: Autonomous tool selection may not align with strictly orchestrated pipelines that need manual control
- Creative or conversational tasks: Step-by-step reasoning adds friction rather than value in these contexts
Conclusion
Qwen 3 Max Thinking is built for situations where getting the right answer matters more than getting it quickly: graduate-level problem solving, real-time research synthesis, and agentic pipelines that need a model capable of recognizing its own knowledge gaps. The combination of autonomous tool orchestration and transparent reasoning chains fits complex, open-ended inference tasks.
Frequently Asked Questions
What tools does Qwen 3 Max Thinking invoke autonomously?
The model has three integrated tools: Search (for retrieving current information), Memory (for tracking user preferences and context across turns), and Code Interpreter (for executing code to verify or compute results). It selects among them without user prompting.
How does the thinking mode differ from standard completion mode?
In thinking mode, the model generates an extended internal reasoning trace before producing its final answer. This trace is visible in the response and reflects multi-step problem decomposition, which improves accuracy on hard reasoning tasks at the cost of more output tokens.
What did the model score on Humanity's Last Exam?
With its Search tool enabled, Qwen 3 Max Thinking scored 49.8 on Humanity's Last Exam, a benchmark of ~3,000 graduate-level questions across math, science, and engineering.
Which SDKs are compatible with this model via AI Gateway?
You can access Qwen 3 Max Thinking through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats supported by AI Gateway.
Does autonomous tool use mean I can't control when tools fire?
You can't manually control which tools fire. The model selects tools as an internal reasoning step. If your application requires deterministic, manually orchestrated tool calls, the non-thinking Qwen3-Max variant with explicit function-calling schemas is a better fit.
How does the Search tool reduce hallucinations?
When Qwen 3 Max Thinking's internal confidence on a factual claim is low, it automatically triggers a search query to retrieve current information rather than generating an answer from its internal knowledge alone, which reduces the frequency of confidently stated incorrect facts.
What is the token cost model for thinking-mode responses?
Thinking-mode completions generate reasoning traces that can be substantially longer than direct answers. Output tokens from both the reasoning chain and the final response are billed.