Skip to content

Qwen3 VL 235B A22B Instruct

Qwen3 VL 235B A22B Instruct is Alibaba's multimodal vision-language model supporting interleaved text, images, and video over a native context of 262.1K tokens, with architectural improvements in spatial-temporal modeling and agentic GUI interaction.

Vision (Image)
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'alibaba/qwen3-vl-instruct',
prompt: 'Why is the sky blue?'
})