Skip to content
Dashboard

Qwen 3 VL 235B A22B Instruct

Qwen 3 VL 235B A22B Instruct is Alibaba's 235B mixture-of-experts vision-language model with 22B active parameters per token, supporting interleaved text, images, and video over a context window of 262.1K tokens for visual coding, spatial perception, and fine-grained visual understanding.

Implicit CachingTool UseVision (Image)
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'alibaba/qwen3-vl-235b-a22b-instruct',
prompt: 'Why is the sky blue?'
})

Playground

Try out Qwen 3 VL 235B A22B Instruct by Alibaba. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

alibaba logo
alibaba logo

Ask Qwen 3 VL 235B A22B Instruct anything to try it out.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
DeepInfra
262K
0.5s
14tps
$0.20/M$0.88/M
Read:$0.11/M
Write:—
——
+1
09/23/2025
Alibaba
131K
0.5s
47tps
$0.40/M$1.60/M——
09/23/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Alibaba

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
0.9s
360tps
$0.32/M$1.28/M
Read:$0.08/M
Write:$0.5/M
——
+3
alibaba logo
fireworks logo
togetherai logo
06/02/2026
991K
2.9s
55tps
$1.25/M$3.75/M
Read:$0.25/M
Write:$1.56/M
——
+1
alibaba logo
05/21/2026
1M
0.3s
114tps
$0.50/M
$3.00/M
Read:
$0.1/M
Write:
$0.63/M
——
+3
alibaba logo
fireworks logo
togetherai logo
04/02/2026
1M
0.7s
272tps
$0.10/M$0.40/M
Read:$0.0/M
Write:$0.13/M
——
+3
alibaba logo
02/24/2026
1M
3.5s
55tps
$0.40/M
$2.40/M
Read:
$0.04/M
Write:
$0.5/M
——
+3
alibaba logo
02/16/2026
33K
$0.05/M——
deepinfra logo
06/05/2025

About Qwen 3 VL 235B A22B Instruct

Qwen 3 VL 235B A22B Instruct is the September 23, 2025 version of Alibaba's 235B-A22B vision-language model in instruct configuration. Built on a mixture-of-experts (MoE) architecture, it carries 235 billion total parameters with approximately 22 billion active per token, and serves a context window of 262.1K tokens for interleaved sequences of text, images, and video frames.

Compared with prior Qwen vision-language generations, the Qwen3 VL series brings improvements across visual coding, spatial perception, and fine-grained visual understanding. Qwen 3 VL 235B A22B Instruct parses charts, diagrams, GUI screenshots, and document images with stronger grounding, and can identify and reason about object positions and relationships within complex scenes.

The instruct configuration is tuned for direct instruction following rather than extended chain-of-thought, which makes Qwen 3 VL 235B A22B Instruct a practical default for production multimodal workloads: document intelligence, screen-reading agents, multi-image analysis, and visual coding pipelines that need fast, structured responses. You can integrate Qwen 3 VL 235B A22B Instruct through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats, from TypeScript or Python, with a maximum output of 262.1K tokens per request.

What To Consider When Choosing a Provider

  • Configuration: Multimodal payloads that combine large images or video frames consume meaningful context tokens. Profile your typical request shape against the context window of 262.1K tokens and confirm your provider's serving infrastructure handles your throughput target before routing production traffic.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Qwen 3 VL 235B A22B Instruct

Best For

  • Visual Coding Pipelines: Turning screenshots, mockups, or diagrams into accurate component or function output
  • Document Intelligence Tasks: Scanned pages, tables, and figures that need fine-grained visual perception
  • Screen-Reading Agents: Interpreting GUI screenshots to plan and execute UI actions
  • Multi-Image Comparative Analysis: Charts, product photos, or document figures reviewed side by side in a single request
  • Unified Multimodal Context: Combined text, image, and video inputs handled within the window of 262.1K tokens

Consider Alternatives When

  • Visible Reasoning Required: Qwen3-VL-Thinking is a closer match when tasks need step-by-step visual reasoning traces
  • Text-Only Workloads: A dedicated text model offers lower cost per token when vision is never used
  • Latency-Critical Basic Tasks: A smaller multimodal model can serve simple instruction following at lower cost
  • Image Or Video Generation: A generation-class model fits tasks that produce pixels rather than read them

Conclusion

Qwen 3 VL 235B A22B Instruct is the pinned 235B-A22B instruct release in the Qwen3 vision-language line, suited to production multimodal workloads that need strong visual perception, spatial reasoning, and direct instruction following. Routing through AI Gateway gives you provider failover, unified billing, and a consistent integration surface across the Qwen3-VL family.