What tools does Qwen 3 Max Thinking invoke autonomously?

The model has three integrated tools: Search (for retrieving current information), Memory (for tracking user preferences and context across turns), and Code Interpreter (for executing code to verify or compute results). It selects among them without user prompting.

How does the thinking mode differ from standard completion mode?

In thinking mode, the model generates an extended internal reasoning trace before producing its final answer. This trace is visible in the response and reflects multi-step problem decomposition, which improves accuracy on hard reasoning tasks at the cost of more output tokens.

What did the model score on Humanity's Last Exam?

With its Search tool enabled, Qwen 3 Max Thinking scored 49.8 on Humanity's Last Exam, a benchmark of ~3,000 graduate-level questions across math, science, and engineering.

Does autonomous tool use mean I can't control when tools fire?

You can't manually control which tools fire. The model selects tools as an internal reasoning step. If your application requires deterministic, manually orchestrated tool calls, the non-thinking Qwen3-Max variant with explicit function-calling schemas is a better fit.

How does the Search tool reduce hallucinations?

When Qwen 3 Max Thinking's internal confidence on a factual claim is low, it automatically triggers a search query to retrieve current information rather than generating an answer from its internal knowledge alone, which reduces the frequency of confidently stated incorrect facts.

What is the token cost model for thinking-mode responses?

Thinking-mode completions generate reasoning traces that can be substantially longer than direct answers. Output tokens from both the reasoning chain and the final response are billed.

Qwen 3 Max Thinking

Qwen 3 Max Thinking is Alibaba's trillion-parameter reasoning model that autonomously deploys built-in search, memory, and code interpreter tools during inference, achieving a score of 49.8 on Humanity's Last Exam with search enabled.

ReasoningTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'alibaba/qwen3-max-thinking',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Qwen 3 Max Thinking by Alibaba. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

256K

1.3s

52tps

$1.20/M

$6.00/M

Read:

$0.24/M

Write:

—

More models by Alibaba

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

0.9s

125tps

$1.25/M

$3.75/M

Read:$0.25/M

Write:$1.56/M

—

05/21/2026

240K

1.7s

83tps

$1.30/M

$7.80/M

Read:

$0.26/M

Write:

$1.63/M

—

04/20/2026

0.6s

95tps

$0.50/M

$3.00/M

Read:

$0.1/M

Write:

$0.63/M

—

04/02/2026

0.6s

241tps

$0.10/M

$0.40/M

Read:$0.0/M

Write:$0.13/M

—

02/24/2026

1.3s

110tps

$0.40/M

$2.40/M

Read:

$0.04/M

Write:

$0.5/M

—

02/16/2026

262K

0.4s

68tps

$0.07/M

$0.46/M

—

04/01/2025

About Qwen 3 Max Thinking

Qwen 3 Max Thinking, released on N/A, extends the Qwen3-Max architecture with a dedicated extended reasoning mode and integrated autonomous tool use. When Qwen 3 Max Thinking encounters a question that exceeds its internal knowledge or requires computation, it independently decides whether to trigger its Search tool (for current information), Memory tool (for cross-turn context persistence), or Code Interpreter (for numerical verification and data processing), without you needing to specify which tool applies.

This autonomous tool selection is a meaningful architectural distinction. Rather than exposing tool invocation as an explicit user-facing control, Qwen 3 Max Thinking treats it as an internal reasoning step, making the interaction feel more like working with a capable assistant that knows when to check its work. The design is intended to reduce hallucination risk on factual queries by defaulting to retrieval when confidence is low, and to improve numerical accuracy by routing computations through an interpreter.

Qwen 3 Max Thinking's thinking mode exposes its reasoning chain before delivering a final answer, providing transparency into multi-step problem decomposition. On Humanity's Last Exam, a benchmark of approximately 3,000 graduate-level questions spanning mathematics, science, and engineering, Qwen 3 Max Thinking with search enabled scored 49.8, competitive with other models on the same benchmark in Alibaba's published comparisons. You can access Qwen 3 Max Thinking through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats, from TypeScript or Python.

What To Consider When Choosing a Provider

Configuration: Thinking-mode responses can be substantially longer than standard completions, factor in output token volume when estimating per-request cost across providers.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Qwen 3 Max Thinking

Best For

STEM and engineering questions: Hard scientific, mathematical, or engineering problems that benefit from visible chain-of-thought reasoning
Search-grounded research assistance: Workflows where real-time search reduces the risk of stale or invented information
Self-correcting numerical agents: Automated agents that route calculations through a code interpreter to verify outputs
Multi-turn personalized conversations: Extended sessions where the model must remember and adapt to user preferences
Autonomous tool orchestration: Complex pipelines where the model decides which tool to invoke without explicit user prompting

Consider Alternatives When

Tight token budgets: Thinking-mode responses consume substantially more output tokens than direct-answer models
Low-latency requirements: Fast completions matter more than extended reasoning, which becomes unnecessary overhead
Deterministic tool control: Autonomous tool selection may not align with strictly orchestrated pipelines that need manual control
Creative or conversational tasks: Step-by-step reasoning adds friction rather than value in these contexts

Conclusion

Qwen 3 Max Thinking is built for situations where getting the right answer matters more than getting it quickly: graduate-level problem solving, real-time research synthesis, and agentic pipelines that need a model capable of recognizing its own knowledge gaps. The combination of autonomous tool orchestration and transparent reasoning chains fits complex, open-ended inference tasks.

Frequently Asked Questions

What tools does Qwen 3 Max Thinking invoke autonomously?
The model has three integrated tools: Search (for retrieving current information), Memory (for tracking user preferences and context across turns), and Code Interpreter (for executing code to verify or compute results). It selects among them without user prompting.
How does the thinking mode differ from standard completion mode?
In thinking mode, the model generates an extended internal reasoning trace before producing its final answer. This trace is visible in the response and reflects multi-step problem decomposition, which improves accuracy on hard reasoning tasks at the cost of more output tokens.
What did the model score on Humanity's Last Exam?
With its Search tool enabled, Qwen 3 Max Thinking scored 49.8 on Humanity's Last Exam, a benchmark of ~3,000 graduate-level questions across math, science, and engineering.
Which SDKs are compatible with this model via AI Gateway?
You can access Qwen 3 Max Thinking through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats, from TypeScript or Python.
Does autonomous tool use mean I can't control when tools fire?
You can't manually control which tools fire. The model selects tools as an internal reasoning step. If your application requires deterministic, manually orchestrated tool calls, the non-thinking Qwen3-Max variant with explicit function-calling schemas is a better fit.
How does the Search tool reduce hallucinations?
When Qwen 3 Max Thinking's internal confidence on a factual claim is low, it automatically triggers a search query to retrieve current information rather than generating an answer from its internal knowledge alone, which reduces the frequency of confidently stated incorrect facts.
What is the token cost model for thinking-mode responses?
Thinking-mode completions generate reasoning traces that can be substantially longer than direct answers. Output tokens from both the reasoning chain and the final response are billed.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Qwen 3 Max Thinking

Playground

Providers

More models by Alibaba

About Qwen 3 Max Thinking

What To Consider When Choosing a Provider

When to Use Qwen 3 Max Thinking

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions