Qwen 3 VL 235B A22B Instruct
Qwen 3 VL 235B A22B Instruct is Alibaba's 235B mixture-of-experts vision-language model with 22B active parameters per token, supporting interleaved text, images, and video over a context window of 262.1K tokens for visual coding, spatial perception, and fine-grained visual understanding.
import { streamText } from 'ai'
const result = streamText({ model: 'alibaba/qwen3-vl-235b-a22b-instruct', prompt: 'Why is the sky blue?'})About Qwen 3 VL 235B A22B Instruct
Qwen 3 VL 235B A22B Instruct is the N/A version of Alibaba's 235B-A22B vision-language model in instruct configuration. Built on a mixture-of-experts (MoE) architecture, it carries 235 billion total parameters with approximately 22 billion active per token, and serves a context window of 262.1K tokens for interleaved sequences of text, images, and video frames.
Compared with prior Qwen vision-language generations, the Qwen3 VL series brings improvements across visual coding, spatial perception, and fine-grained visual understanding. Qwen 3 VL 235B A22B Instruct parses charts, diagrams, GUI screenshots, and document images with stronger grounding, and can identify and reason about object positions and relationships within complex scenes.
The instruct configuration is tuned for direct instruction following rather than extended chain-of-thought, which makes Qwen 3 VL 235B A22B Instruct a practical default for production multimodal workloads: document intelligence, screen-reading agents, multi-image analysis, and visual coding pipelines that need fast, structured responses. You can integrate Qwen 3 VL 235B A22B Instruct through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats, from TypeScript or Python, with a maximum output of 262.1K tokens per request.