Skip to content

GLM 5V Turbo

GLM 5V Turbo is Z.ai's vision-enabled turbo model released April 1, 2026. It turns screenshots and designs into code, debugs visually, and operates GUIs autonomously, combining GLM-5's agentic capabilities with multimodal vision input at a compact parameter size.

ReasoningTool UseImplicit CachingVision (Image)File Input
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'zai/glm-5v-turbo',
prompt: 'Why is the sky blue?'
})

Playground

Try out GLM 5V Turbo by Z.ai. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About GLM 5V Turbo

GLM 5V Turbo was released April 1, 2026 as the vision-enabled turbo variant in Z.ai's GLM-5 generation. It combines GLM-5's agentic capabilities with multimodal vision input, purpose-built for workflows where visual understanding drives code generation and UI interaction.

The model focuses on design-to-code generation. Given a screenshot or design mockup, GLM 5V Turbo produces responsive components that match the original layout. It can debug visually by examining screenshots of rendered output and identifying discrepancies, then generating fixes. The model also navigates real GUI environments autonomously, reading screen elements and performing actions without manual scripting.

Despite these multimodal capabilities, GLM 5V Turbo operates at a smaller parameter size than comparable vision-language models. This translates to faster inference and lower cost per request, making high-volume visual coding workflows economically viable. Through AI Gateway, it's accessible via the same unified API with built-in observability and provider routing.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Z.ai
Legal:Terms
Privacy
200K
0.8s
209tps
$1.20/M$4.00/M
Read:$0.24/M
Write:
04/01/2026
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Z.ai

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
205K
0.6s
51tps
$1.40/M$4.40/M
Read:$0.26/M
Write:
deepinfra logo
fireworks logo
novita logo
+1
04/07/2026
203K
0.9s
129tps
$1.20/M$4.00/M
Read:$0.24/M
Write:
zai logo
03/15/2026
203K
0.5s
86tps
$0.80/M$2.56/M
Read:$0.16/M
Write:
bedrock logo
deepinfra logo
fireworks logo
+3
02/12/2026
205K
0.1s
596tps
$2.25/M$2.75/M
Read:$2.25/M
Write:
bedrock logo
cerebras logo
deepinfra logo
+2
12/22/2025
205K
0.4s
93tps
$0.60/M$2.20/M
Read:$0.11/M
Write:
baseten logo
deepinfra logo
novita logo
+1
09/30/2025
200K
0.1s
240tps
$0.07/M$0.40/M
Read:$0.01/M
Write:
bedrock logo
zai logo

What To Consider When Choosing a Provider

  • Configuration: GLM 5V Turbo converts visual designs to code. For best results, provide clean screenshots at sufficient resolution and specify the target framework (React, HTML/CSS, etc.) in your prompt.
  • Configuration: You can use GLM 5V Turbo in an iterative loop: render code, screenshot the result, feed it back to the model for corrections. This workflow leverages both vision and coding capabilities.
  • Configuration: The compact parameter size means faster inference, but the most complex visual reasoning tasks may benefit from the full GLM-4.6V (106B). Benchmark on your specific use cases.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GLM 5V Turbo

Best For

  • Design-to-code generation: Screenshots and mockups convert into responsive React components, HTML, and CSS
  • Visual debugging: The model examines rendered output, identifies layout issues, and generates fixes
  • GUI automation: Real screen environments navigated autonomously for testing and interaction workflows
  • Agentic visual coding pipelines: Image understanding combined with autonomous code planning and iteration
  • High-volume visual processing: The compact parameter size keeps inference fast and cost-effective

Consider Alternatives When

  • Maximum visual reasoning depth: GLM-4.6V (106B) provides the largest vision-language model in the lineup without speed constraints
  • Text-only workloads: GLM-5-Turbo offers the same generation's speed without vision overhead
  • Simple captioning or classification: A lighter vision model may be more cost-effective for basic image tasks
  • Deepest text reasoning: The full GLM-5 text-only variant provides multiple thinking modes without vision input

Conclusion

GLM 5V Turbo bridges vision and code generation in a fast, compact package. For teams building design-to-code pipelines, visual debugging loops, or autonomous GUI agents, it delivers the GLM-5 generation's agentic capabilities with multimodal input at a practical speed and cost profile.

Frequently Asked Questions

  • What can GLM 5V Turbo do with screenshots?

    It converts screenshots and design mockups into responsive code, identifies visual bugs in rendered output, and navigates GUI environments by reading screen elements and performing actions.

  • How does GLM 5V Turbo compare to GLM-4.6V?

    GLM 5V Turbo is a newer, compact vision model focused on coding and GUI tasks. GLM-4.6V is a larger 106B parameter model with broader vision-language capabilities including native multimodal function calling and interleaved image-text generation.

  • Does GLM 5V Turbo support design-to-code generation?

    Yes. It's specifically built for this workflow. Provide a screenshot or design mockup and specify the target framework. The model generates matching responsive components.

  • What is the context window for GLM 5V Turbo?

    200K tokens.

  • How do I authenticate with GLM 5V Turbo through AI Gateway?

    AI Gateway provides a unified API key. No separate Z.ai account is needed. Use the glm-5v-turbo model identifier to route requests. BYOK is also supported.

  • Can GLM 5V Turbo operate GUIs autonomously?

    Yes. It reads screen elements, interprets visual context, and performs navigation actions in real GUI environments. This makes it useful for automated testing and UI interaction workflows.

  • What is the pricing for GLM 5V Turbo?

    Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves GLM 5V Turbo.