GLM-4.6V-Flash is the 9B parameter efficiency variant in Z.ai's GLM-4.6V family, released September 30, 2025. Where GLM-4.6V targets maximum capability at 106B parameters, GLM-4.6V-Flash delivers vision-language understanding at a scale suitable for latency-sensitive production workloads.
Despite its compact size, GLM-4.6V-Flash retains the core multimodal capabilities of the GLM-4.6V generation: context window of 128K tokens, multimodal document understanding, and visual reasoning. The reduced parameter count translates to faster inference and lower per-token cost, making high-volume visual processing pipelines economically viable.
Route traffic through AI Gateway for managed access, unified billing, and built-in observability across providers.