GLM 4.5V
GLM 4.5V is Z.ai's vision-language model built on GLM-4.5-Air. It supports image reasoning, long video understanding, GUI task handling, and visual grounding.
import { streamText } from 'ai'
const result = streamText({ model: 'zai/glm-4.5v', prompt: 'Why is the sky blue?'})Frequently Asked Questions
What visual inputs does GLM 4.5V support?
Single images, multiple images, long videos, screenshots, charts, documents, and GUI interfaces. It processes these alongside text prompts in a single request.
What is visual grounding in GLM 4.5V?
Visual grounding lets the model identify and localize specific elements in images by returning bounding box coordinates. Coordinates are normalized by image dimensions, enabling programmatic interaction with detected visual elements.
Does GLM 4.5V support video input?
Yes. It handles long video understanding with event recognition and temporal reasoning, processing extended video content within the context window.
How does the thinking mode work?
You can toggle thinking on or off per request. Thinking mode enables deeper chain-of-thought reasoning for complex visual tasks. Disabling it provides faster, more direct responses for simpler queries.
How do I authenticate with GLM 4.5V through AI Gateway?
AI Gateway provides a unified API key. No separate Z.ai account is needed. Configure your API key and use the model identifier to route requests. BYOK is also supported for direct provider accounts.
How does GLM 4.5V compare to GLM-4.6V?
GLM 4.5V builds on GLM-4.5-Air and targets vision-language tasks at its scale. GLM-4.6V is the next generation with a 128K context window, native multimodal function calling, and improved frontend replication capabilities.
Can GLM 4.5V generate images?
No. GLM 4.5V accepts visual inputs and produces text output only. For image generation, use a dedicated image generation model.