Skip to content
Vercel April 2026 security incident

Qwen3 235B A22b Instruct 2507

alibaba/qwen-3-235b

Qwen3 235B A22b Instruct 2507 is Alibaba's large-scale 235B mixture-of-experts model with a context window of 131.1K tokens, activating 22 billion of 235 billion parameters per inference to deliver strong reasoning, coding, and multilingual performance.

ReasoningTool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'alibaba/qwen-3-235b',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

For applications with strict data sovereignty requirements, provider selection through AI Gateway lets you target infrastructure in specific regions without modifying your application's API calls.

When to Use Qwen3 235B A22b Instruct 2507

Best For

  • Complex reasoning tasks:

    When a problem genuinely requires the highest reasoning ceiling in the Qwen3 family, such as graduate-level mathematics, complex logical deduction chains, or multi-step proofs, this model is designed for it

  • Competitive coding challenges and complex software engineering:

    The model's benchmark results in coding tasks place it alongside large proprietary models, making it appropriate for problems that smaller models consistently struggle with

  • Enterprise agentic pipelines:

    With strong tool-calling capabilities and MCP support, Qwen3 235B A22b Instruct 2507 can serve as the planning and reasoning backbone for multi-step automated workflows where reliability matters

  • Multilingual content at scale:

    Covering 119 languages, this model suits global-facing products that need high-quality output across many language families without running separate per-language models

  • Research and evaluation baselines:

    As the largest-capability open-weight model in the Qwen3 Apache 2.0-licensed line, it's a useful reference point for capability assessments

Consider Alternatives When

  • Latency is a binding constraint:

    Larger models take longer to generate tokens. For interactive applications where response time outweighs peak quality, smaller Qwen3 variants or the 30B-A3B MoE model will serve users better

  • Cost per query needs to be minimized:

    The 235B model carries higher inference costs than smaller family members. If most queries are routine instruction-following tasks, a smaller model will deliver adequate quality at lower cost

  • Lighter-weight alternatives suffice:

    For teams that don't need the full capability ceiling of a 235B model, smaller dense variants like Qwen3-14B or Qwen3-32B handle many production workloads at lower per-token cost

Conclusion

Qwen3 235B A22b Instruct 2507 is the right choice when the task genuinely demands the highest capability in the Qwen3 family, complex reasoning, high-stakes coding, or multilingual generation at quality levels that smaller models can't reliably reach. Running it through AI Gateway means a single API integration covers provider failover, consolidated billing, and access to novita, cerebras, deepinfra without additional account management.

FAQ

The mixture-of-experts (MoE) architecture routes each token through eight of 128 specialized expert layers rather than the full parameter set. Inference compute scales with the activated parameters (22B), not the total (235B). This keeps a 235B-capacity model economically viable to serve while retaining the broad knowledge encoded in its full parameter space.

Alibaba's benchmarks show competitive results against other strong reasoning models on coding, mathematics, and general capability evaluations. Specific numbers vary by benchmark.

Qwen3-32B is a dense model where all 32 billion parameters activate every inference, while Qwen3 235B A22b Instruct 2507 is a sparse MoE with 10 times the total capacity but slightly fewer active parameters than the 32B. The 235B variant reaches higher benchmark ceilings on the hardest tasks while the 32B offers a simpler deployment profile.

Yes. Thinking mode and non-thinking mode are both available, and you can configure a thinking budget per request. Smaller budgets reduce latency and cost; larger budgets allow the model to work through more complex reasoning steps before responding.

Qwen3 235B A22b Instruct 2507 supports 119 languages and dialects across Indo-European, Sino-Tibetan, Afro-Asiatic, Austronesian, Dravidian, Turkic, and other language families.

Yes, Zero Data Retention is available for this model. ZDR on AI Gateway applies to direct gateway requests; BYOK flows aren't covered. See the ZDR documentation at https://vercel.com/docs/ai-gateway/capabilities/zdr for setup instructions.