Why does a 235B model only activate 22B parameters per inference?

The mixture-of-experts (MoE) architecture routes each token through eight of 128 specialized expert layers rather than the full parameter set. Inference compute scales with the activated parameters (22B), not the total (235B). This keeps a 235B-capacity model economically viable to serve while retaining the broad knowledge encoded in its full parameter space.

How does Qwen3 235B A22B perform against other large language models on benchmarks?

Alibaba's benchmarks show competitive results against other strong reasoning models on coding, mathematics, and general capability evaluations. Specific numbers vary by benchmark.

What distinguishes the 235B MoE model from Qwen3-32B in the same family?

Qwen3-32B is a dense model where all 32 billion parameters activate every inference, while Qwen3 235B A22B is a sparse MoE with 10 times the total capacity but slightly fewer active parameters than the 32B. The 235B variant reaches higher benchmark ceilings on the hardest tasks while the 32B offers a simpler deployment profile.

Can I control how much reasoning compute the model uses per request?

Yes. Thinking mode and non-thinking mode are both available, and you can configure a thinking budget per request. Smaller budgets reduce latency and cost; larger budgets allow the model to work through more complex reasoning steps before responding.

What languages does this model support?

Qwen3 235B A22B supports 119 languages and dialects across Indo-European, Sino-Tibetan, Afro-Asiatic, Austronesian, Dravidian, Turkic, and other language families.

Is Zero Data Retention available for this model?

Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.

Qwen3 235B A22B

Qwen3 235B A22B is Alibaba's large-scale 235B mixture-of-experts model with a context window of 262.1K tokens, activating 22 billion of 235 billion parameters per inference to deliver strong reasoning, coding, and multilingual performance.

Tool Use

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'alibaba/qwen-3-235b',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Frequently Asked Questions

Why does a 235B model only activate 22B parameters per inference?
The mixture-of-experts (MoE) architecture routes each token through eight of 128 specialized expert layers rather than the full parameter set. Inference compute scales with the activated parameters (22B), not the total (235B). This keeps a 235B-capacity model economically viable to serve while retaining the broad knowledge encoded in its full parameter space.
How does Qwen3 235B A22B perform against other large language models on benchmarks?
Alibaba's benchmarks show competitive results against other strong reasoning models on coding, mathematics, and general capability evaluations. Specific numbers vary by benchmark.
What distinguishes the 235B MoE model from Qwen3-32B in the same family?
Qwen3-32B is a dense model where all 32 billion parameters activate every inference, while Qwen3 235B A22B is a sparse MoE with 10 times the total capacity but slightly fewer active parameters than the 32B. The 235B variant reaches higher benchmark ceilings on the hardest tasks while the 32B offers a simpler deployment profile.
Can I control how much reasoning compute the model uses per request?
Yes. Thinking mode and non-thinking mode are both available, and you can configure a thinking budget per request. Smaller budgets reduce latency and cost; larger budgets allow the model to work through more complex reasoning steps before responding.
What languages does this model support?
Qwen3 235B A22B supports 119 languages and dialects across Indo-European, Sino-Tibetan, Afro-Asiatic, Austronesian, Dravidian, Turkic, and other language families.
Is Zero Data Retention available for this model?
Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Qwen3 235B A22B

Frequently Asked Questions