Skip to content

Qwen3-30B-A3B

Qwen3-30B-A3B is a mixture-of-experts model from Alibaba that activates only 3 billion of its 30 billion parameters per inference, outperforming QwQ-32B while running at a fraction of the compute cost.

ReasoningTool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'alibaba/qwen-3-30b',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • How is it possible for a 3B-active-parameter model to outperform QwQ-32B?

    The mixture-of-experts architecture separates total parameter count from inference compute. At inference, routing selects the most relevant 3 billion parameters for each token. The model benefits from the broad capacity of its 30 billion total parameters while keeping serving costs proportional to the 3B active count. QwQ-32B activates all 32 billion parameters but has less total representational capacity.

  • What does "A3B" mean in the model name?

    "A3B" indicates that 3 billion parameters are activated during inference (A = activated, 3B = 3 billion). The "30B" is the total parameter count across all expert layers.

  • How does the 30B-A3B architecture affect serving cost?

    At inference, only 3 billion parameters activate per token, so per-token compute is comparable to a 3B dense model even though the full MoE has 30 billion parameters. This is the source of the cost advantage over dense 32B-class models at similar quality.

  • How does the thinking budget control work in practice?

    You set a token budget for the thinking trace via the API. Higher budgets allow the model to explore more reasoning steps before producing its answer. Lower budgets constrain the reasoning phase, producing faster responses, useful when a question is straightforward and extended reasoning wouldn't add value.

  • Does Qwen3-30B-A3B support the same 119 languages as other Qwen3 models?

    Yes. The 119-language coverage applies across the Qwen3 family, including this model.

  • What agentic use cases is this model suited for?

    The model supports tool calling and MCP (Model Context Protocol). It fits automated workflows where the model needs to select and invoke tools across multiple steps, particularly in cost-sensitive deployments where running a larger model per agent step would be prohibitive.

  • How does AI Gateway route requests for this model?

    AI Gateway selects among deepinfra based on availability and performance. If a provider returns an error or is slow to respond, requests automatically retry with another provider in the pool, so your application doesn't need to implement retry logic independently.