Qwen 3.5 Flash

Qwen 3.5 Flash is Alibaba's production-hosted multimodal model built on a hybrid linear-attention MoE architecture, offering a context window of 1M tokens and sub-second responsiveness for high-throughput agentic workloads.

Vision (Image)Explicit CachingFile InputReasoningTool Use

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'alibaba/qwen3.5-flash',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

About Qwen 3.5 Flash

Qwen 3.5 Flash is built on Alibaba's fifth-generation Qwen3.5 architecture, which combines Gated DeltaNet linear attention with sparse mixture-of-experts layers in a 3:1 linear-to-full attention ratio. This design allows the model to process very long documents and codebases efficiently while keeping inference costs low, the hosted Flash tier makes contexts of 1M tokens the default rather than an opt-in premium.

The model handles text, images, and video natively in a single forward pass, without requiring separate vision adapters. That native multimodality makes it well-suited for workflows that mix screenshot analysis, document review, and code generation in the same conversation. Structured outputs, tool calling, and seed-based reproducibility are all supported out of the box.

Qwen 3.5 Flash ships with configurable reasoning depth, letting callers dial up or down the amount of internal chain-of-thought the model performs before responding. At lower reasoning settings the model behaves like a fast instruction-follower; at higher settings it performs multi-step decomposition suitable for mathematical problem solving or complex agentic tasks.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Qwen 3.5 Flash

About Qwen 3.5 Flash