How does GLM-4.6V-Flash compare to the full GLM-4.6V?

GLM-4.6V-Flash is a 9B parameter model optimized for speed and lower per-token cost. GLM-4.6V is the full 106B parameter model for maximum visual reasoning capability. Both share a context window of 128K tokens.

Does GLM-4.6V-Flash support the same inputs as GLM-4.6V?

Yes. It processes images, documents, charts, and text within a context window of 128K tokens, though peak performance on the most complex visual reasoning tasks will be lower than the full 106B model.

What is the context window for GLM-4.6V-Flash?

128K tokens, matching the full GLM-4.6V model.

How do I authenticate with GLM-4.6V-Flash through AI Gateway?

AI Gateway provides a unified API key. No separate Z.ai account is needed. Specify the model identifier and AI Gateway handles routing. BYOK is also supported.

What is the pricing for GLM-4.6V-Flash?

See the pricing section on this page for today's rates. AI Gateway exposes each provider's pricing for GLM-4.6V-Flash.

Is GLM-4.6V-Flash suitable for agentic visual workflows?

Yes, for agent steps that prioritize speed. For complex visual planning requiring deep reasoning or native multimodal function calling at peak accuracy, route those steps to the full GLM-4.6V model.

GLM-4.6V-Flash

GLM-4.6V-Flash is Z.ai's lightweight 9B parameter vision-language model for low-latency applications. It shares GLM-4.6V's multimodal capabilities at a fraction of the compute cost.

Vision (Image)ReasoningFile InputTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'zai/glm-4.6v-flash',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Frequently Asked Questions

How does GLM-4.6V-Flash compare to the full GLM-4.6V?
GLM-4.6V-Flash is a 9B parameter model optimized for speed and lower per-token cost. GLM-4.6V is the full 106B parameter model for maximum visual reasoning capability. Both share a context window of 128K tokens.
Does GLM-4.6V-Flash support the same inputs as GLM-4.6V?
Yes. It processes images, documents, charts, and text within a context window of 128K tokens, though peak performance on the most complex visual reasoning tasks will be lower than the full 106B model.
What is the context window for GLM-4.6V-Flash?
128K tokens, matching the full GLM-4.6V model.
How do I authenticate with GLM-4.6V-Flash through AI Gateway?
AI Gateway provides a unified API key. No separate Z.ai account is needed. Specify the model identifier and AI Gateway handles routing. BYOK is also supported.
What is the pricing for GLM-4.6V-Flash?
See the pricing section on this page for today's rates. AI Gateway exposes each provider's pricing for GLM-4.6V-Flash.
Is GLM-4.6V-Flash suitable for agentic visual workflows?
Yes, for agent steps that prioritize speed. For complex visual planning requiring deep reasoning or native multimodal function calling at peak accuracy, route those steps to the full GLM-4.6V model.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

GLM-4.6V-Flash

Frequently Asked Questions