Qwen 3 VL 235B A22B Instruct
Qwen 3 VL 235B A22B Instruct is Alibaba's 235B mixture-of-experts vision-language model with 22B active parameters per token, supporting interleaved text, images, and video over a context window of 262.1K tokens for visual coding, spatial perception, and fine-grained visual understanding.
import { streamText } from 'ai'
const result = streamText({ model: 'alibaba/qwen3-vl-235b-a22b-instruct', prompt: 'Why is the sky blue?'})Playground
Try out Qwen 3 VL 235B A22B Instruct by Alibaba. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Alibaba
| Model |
|---|
About Qwen 3 VL 235B A22B Instruct
Qwen 3 VL 235B A22B Instruct is the N/A version of Alibaba's 235B-A22B vision-language model in instruct configuration. Built on a mixture-of-experts (MoE) architecture, it carries 235 billion total parameters with approximately 22 billion active per token, and serves a context window of 262.1K tokens for interleaved sequences of text, images, and video frames.
Compared with prior Qwen vision-language generations, the Qwen3 VL series brings improvements across visual coding, spatial perception, and fine-grained visual understanding. Qwen 3 VL 235B A22B Instruct parses charts, diagrams, GUI screenshots, and document images with stronger grounding, and can identify and reason about object positions and relationships within complex scenes.
The instruct configuration is tuned for direct instruction following rather than extended chain-of-thought, which makes Qwen 3 VL 235B A22B Instruct a practical default for production multimodal workloads: document intelligence, screen-reading agents, multi-image analysis, and visual coding pipelines that need fast, structured responses. You can integrate Qwen 3 VL 235B A22B Instruct through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats, from TypeScript or Python, with a maximum output of 262.1K tokens per request.
What To Consider When Choosing a Provider
- Configuration: Multimodal payloads that combine large images or video frames consume meaningful context tokens. Profile your typical request shape against the context window of 262.1K tokens and confirm your provider's serving infrastructure handles your throughput target before routing production traffic.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Qwen 3 VL 235B A22B Instruct
Best For
- Visual Coding Pipelines: Turning screenshots, mockups, or diagrams into accurate component or function output
- Document Intelligence Tasks: Scanned pages, tables, and figures that need fine-grained visual perception
- Screen-Reading Agents: Interpreting GUI screenshots to plan and execute UI actions
- Multi-Image Comparative Analysis: Charts, product photos, or document figures reviewed side by side in a single request
- Unified Multimodal Context: Combined text, image, and video inputs handled within the window of 262.1K tokens
Consider Alternatives When
- Visible Reasoning Required: Qwen3-VL-Thinking is a closer match when tasks need step-by-step visual reasoning traces
- Text-Only Workloads: A dedicated text model offers lower cost per token when vision is never used
- Latency-Critical Basic Tasks: A smaller multimodal model can serve simple instruction following at lower cost
- Image Or Video Generation: A generation-class model fits tasks that produce pixels rather than read them
Conclusion
Qwen 3 VL 235B A22B Instruct is the pinned 235B-A22B instruct release in the Qwen3 vision-language line, suited to production multimodal workloads that need strong visual perception, spatial reasoning, and direct instruction following. Routing through AI Gateway gives you provider failover, unified billing, and a consistent integration surface across the Qwen3-VL family.
Frequently Asked Questions
What modalities does Qwen 3 VL 235B A22B Instruct accept?
The model accepts interleaved text, images, and video frames within a single context window of up to 262.1K tokens, with output up to 262.1K tokens tokens per request.
How is Qwen 3 VL 235B A22B Instruct different from Qwen3-VL-Thinking?
The instruct variant produces direct, structured answers and is generally faster and cheaper to run. Qwen 3 VL 235B A22B Instruct is tuned for instruction following without extended reasoning traces, while Qwen3-VL-Thinking emits a visible chain-of-thought before its final response and is better suited to complex visual STEM and compositional reasoning.
What does the 235B-A22B notation mean?
Qwen 3 VL 235B A22B Instruct is a mixture-of-experts model with 235 billion total parameters, of which approximately 22 billion activate per token. This design keeps the active compute close to a 22B dense model while preserving the capability profile of a much larger network.
Which API surfaces work with Qwen 3 VL 235B A22B Instruct on AI Gateway?
You can call Qwen 3 VL 235B A22B Instruct through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats, from TypeScript or Python. Reference `
alibaba/qwen3-vl-235b-a22b-instruct` as the model identifier in your request.What is the context window for Qwen 3 VL 235B A22B Instruct?
The context window is 262.1K tokens, which applies to the combined sequence of text tokens and visual tokens (image patches and video frames encoded as tokens) in an interleaved request.
Does Qwen 3 VL 235B A22B Instruct support zero data retention?
Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.
Where can I see live latency and cost data for Qwen 3 VL 235B A22B Instruct?
This page shows live throughput, time-to-first-token, and pricing metrics for Qwen 3 VL 235B A22B Instruct measured across real AI Gateway traffic.