Question 1

Why does this model only support thinking mode, not a standard non-thinking mode?

Accepted Answer

This variant is specialized for complex reasoning. By committing entirely to thinking mode, it avoids the quality compromises that come from training a single model to switch between reasoning and direct-answer behaviors.

Question 2

How long can the thinking trace be?

Accepted Answer

The recommended budget is 32,768 tokens for typical queries and up to 81,920 tokens for complex mathematics or coding problems. These are recommendations; actual trace length is determined by the model based on problem complexity.

Question 3

How does the AIME25 score compare to other models in the family?

Accepted Answer

The Thinking variant outperforms the Instruct variant's 69.5% on AIME25, and also surpasses Qwen3-30B-A3B-Thinking-2507 and several proprietary reasoning models in Qwen's published comparisons on this benchmark. See https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?type=model&url=2840914_2&modelId=qwen3-next-80b-a3b-thinking for specific scores.

Question 4

Does the Hybrid Transformer-Mamba architecture help during reasoning?

Accepted Answer

Yes. The linear-attention Gated DeltaNet layers allow the model to handle sequences that grow long during reasoning, prompt plus extended thinking trace, at sub-quadratic cost compared to full attention. This keeps generation efficient even for hard problems that trigger long traces.

Question 5

What is the native context length?

Accepted Answer

The native context is 262.1K tokens, extensible to approximately one million tokens via YaRN rope scaling. This allows the model to reason over very long input documents alongside its own thinking trace.

Question 6

How should I parse the thinking content from responses?

Accepted Answer

The model outputs reasoning between `` and `` before the final answer. If the opening tag is missing, find the closing `` token (see Qwen reference parsers) and split there into thinking content and final response.

Question 7

How does this model compare to Qwen3-Max-Thinking for reasoning tasks?

Accepted Answer

Both models support extended reasoning, but they represent different architectural tradeoffs. Qwen3 Next 80B A3B Thinking uses a sparse hybrid architecture optimized for throughput on long sequences; Qwen3-Max-Thinking uses a trillion-parameter model with autonomous tool invocation. The right choice depends on whether autonomous search/code-execution or architecture-driven efficiency is more valuable for your workload.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Qwen3 Next 80B A3B Thinking

Frequently Asked Questions