Skip to content

Qwen3 Next 80B A3B Instruct

Qwen3 Next 80B A3B Instruct is an 80-billion-parameter hybrid Transformer-Mamba model that activates only 3B parameters per token, delivering 10x inference throughput over dense alternatives at a native context window of 262.1K tokens.

index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'alibaba/qwen3-next-80b-a3b-instruct',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • What does "80B-A3B" mean in this model's name?

    "80B" refers to 80 billion total parameters in the MoE pool; "A3B" indicates that approximately 3 billion parameters are activated per token. Only 10 of 512 experts fire for each token, giving the model a 3.75% activation ratio.

  • How does Hybrid Transformer-Mamba architecture affect performance on long contexts?

    The architecture alternates Gated DeltaNet (linear attention) with sparse Gated Attention. Linear attention scales sub-quadratically with sequence length, enabling the model to process sequences of 262.1K tokens with significantly lower compute than a fully quadratic attention model.

  • What is the throughput advantage over a dense model?

    On sequences of 32K tokens or longer, Qwen3 Next 80B A3B Instruct achieves approximately 10x higher throughput than a comparable Qwen3-32B dense model, according to technical specifications.

  • Does this model support a thinking or reasoning mode?

    No. The Instruct variant is optimized for direct instruction following without thinking traces. The Qwen3-Next-80B-A3B-Thinking variant provides reasoning mode.

  • What is the maximum context length?

    The native context is 262.1K tokens. Using YaRN rope scaling, this can be extended to approximately one million tokens. The model achieves 80.3% accuracy on the 1M RULER extreme-context benchmark.

  • What benchmarks has this model been evaluated on?

    Key scores include MMLU-Pro (80.6), MMLU-Redux (90.9), GPQA (72.9), AIME25 (69.5), LiveCodeBench (56.6), Arena-Hard v2 (82.7), and BFCL-v3 (70.3) for function calling.

  • Is Multi-Token Prediction supported?

    Yes. Multi-Token Prediction further accelerates inference beyond the baseline throughput gains from the sparse MoE architecture.