Qwen3 Next 80B A3B Thinking is the reasoning-mode counterpart to Qwen3-Next-80B-A3B-Instruct. It shares the identical Hybrid Transformer-Mamba architecture, 48 layers in a 12-block pattern of three Gated DeltaNet + MoE layers followed by one Gated Attention + MoE layer, with 512 total experts and only 10 activated per token. What distinguishes the Thinking variant is that thinking mode is the only mode: the model always generates a <think> reasoning trace before its final answer, and the recommended token budget for that trace ranges from 32,768 tokens for typical queries to 81,920 tokens for difficult mathematical or coding problems.
This exclusive thinking mode is a deliberate design choice. By eliminating mode switching, the model is specialized for tasks where getting the right answer matters more than minimizing output length. The architecture's linear-attention Gated DeltaNet layers keep context processing efficient even as reasoning traces extend the total sequence length substantially beyond the prompt, which helps when reasoning chains grow long.
Benchmark results reflect this specialization. Across math and coding benchmarks the model outperforms both the Qwen3-30B-A3B-Thinking-2507 and Qwen3-32B-Thinking predecessors, as well as several proprietary reasoning models in Qwen's published comparisons. See https://modelstudio.console.alibabacloud.com/?tab=doc#/doc/?type=model&url=2840914_2&modelId=qwen3-next-80b-a3b-thinking for detailed benchmark tables.