GLM-4.6V-Flash
GLM-4.6V-Flash is Z.ai's lightweight 9B parameter vision-language model for low-latency applications. It shares GLM-4.6V's multimodal capabilities at a fraction of the compute cost.
import { streamText } from 'ai'
const result = streamText({ model: 'zai/glm-4.6v-flash', prompt: 'Why is the sky blue?'})About GLM-4.6V-Flash
GLM-4.6V-Flash is the 9B parameter efficiency variant in Z.ai's GLM-4.6V family, released September 30, 2025. Where GLM-4.6V targets maximum capability at 106B parameters, GLM-4.6V-Flash delivers vision-language understanding at a scale suitable for latency-sensitive production workloads.
Despite its compact size, GLM-4.6V-Flash retains the core multimodal capabilities of the GLM-4.6V generation: context window of 128K tokens, multimodal document understanding, and visual reasoning. The reduced parameter count translates to faster inference and lower per-token cost, making high-volume visual processing pipelines economically viable.
Route traffic through AI Gateway for managed access, unified billing, and built-in observability across providers.