Skip to content

GLM 5 Turbo

GLM 5 Turbo is the speed-optimized variant of Z.ai's GLM-5, released March 15, 2026. It trades some reasoning depth for faster throughput and lower latency while retaining GLM-5's multiple thinking modes and agentic capabilities.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'zai/glm-5-turbo',
prompt: 'Why is the sky blue?'
})

Playground

Try out GLM 5 Turbo by Z.ai. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Z.ai
Legal:Terms
Privacy
203K
0.9s
98tps
$1.20/M$4.00/M
Read:$0.24/M
Write:
03/15/2026
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Z.ai

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
205K
1.0s
53tps
$1.40/M$4.40/M
Read:$0.26/M
Write:
deepinfra logo
fireworks logo
novita logo
+1
04/07/2026
200K
0.9s
161tps
$1.20/M$4.00/M
Read:$0.24/M
Write:
zai logo
04/01/2026
203K
0.4s
70tps
$0.80/M$2.56/M
Read:$0.16/M
Write:
bedrock logo
deepinfra logo
fireworks logo
+3
02/12/2026
205K
0.1s
557tps
$2.25/M$2.75/M
Read:$2.25/M
Write:
bedrock logo
cerebras logo
deepinfra logo
+2
12/22/2025
205K
0.4s
243tps
$0.60/M$2.20/M
Read:$0.11/M
Write:
baseten logo
deepinfra logo
novita logo
+1
09/30/2025
200K
0.1s
164tps
$0.07/M$0.40/M
Read:$0.01/M
Write:
bedrock logo
zai logo

About GLM 5 Turbo

GLM 5 Turbo was released March 15, 2026 as the speed-optimized variant in Z.ai's GLM-5 generation. The GLM-5 generation introduced selectable thinking modes so you can dial reasoning depth per request, and GLM 5 Turbo makes that capability affordable at production scale.

Agentic pipelines benefit the most. Many pipeline steps don't require the full GLM-5's deliberation depth, but they do benefit from the structured thinking modes when problems get harder. GLM 5 Turbo lets you route routine steps to a lightweight thinking mode for fast responses, then escalate harder steps to a deeper mode, all within the same model and API call format.

The turbo variant also inherits GLM-5's improved long-range planning and agentic coding capabilities. Combined with the lower per-token cost and faster throughput, this makes it practical to run multi-step agent workflows that would be prohibitively expensive at full GLM-5 pricing. Through AI Gateway, GLM 5 Turbo shares the same API surface as GLM-5.

What To Consider When Choosing a Provider

  • Configuration: GLM 5 Turbo supports multiple thinking modes like GLM-5. Match the mode to the task: lightweight modes for extraction and classification, deeper modes for multi-step reasoning. Even deeper modes run faster than the equivalent on the full GLM-5.
  • Configuration: For the most complex reasoning chains, the full GLM-5 will produce higher-quality results. Benchmark both on your hardest tasks to quantify the difference.
  • Configuration: Switching between GLM-5 and GLM 5 Turbo requires only changing the model identifier. No integration changes needed.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GLM 5 Turbo

Best For

  • High-volume agentic pipelines: Most steps need GLM-5-class capability at lower latency and cost
  • Structured data extraction: Documents where speed matters as much as accuracy
  • Real-time coding assistance: Fast responses improve developer productivity without sacrificing agentic capabilities
  • Production deployments at scale: Per-token cost directly impacts margins
  • Multi-step workflows: Fast execution steps on GLM 5 Turbo pair with complex reasoning steps on the full GLM-5

Consider Alternatives When

  • Maximum reasoning depth: The full GLM-5 provides the deepest deliberation in the generation on every request
  • Vision or multimodal input: GLM-5V-Turbo adds image understanding to the turbo tier
  • Frontend code focus: GLM-4.7 offers targeted frontend improvements at lower cost
  • Absolute fastest inference: GLM-4.7-FlashX provides the lowest latency option when minimal capability is acceptable

Conclusion

Selectable thinking modes at production-friendly pricing make GLM 5 Turbo the practical entry point for teams adopting GLM-5 generation capabilities. Route agentic workflows through AI Gateway and scale between thinking depth levels per request.

Frequently Asked Questions

  • How does GLM 5 Turbo compare to the full GLM-5?

    GLM 5 Turbo shares GLM-5's core capabilities, including multiple thinking modes and enhanced agentic coding. It's optimized for faster inference at lower cost, with some reduction in peak reasoning depth on the most complex tasks.

  • Does GLM 5 Turbo support multiple thinking modes?

    Yes. It retains GLM-5's multiple thinking modes, letting you select the reasoning depth per request. All modes run faster than their equivalents on the full GLM-5.

  • What is the context window for GLM 5 Turbo?

    202.8K tokens.

  • Can I switch between GLM-5 and GLM 5 Turbo easily?

    Yes. Both share the same API surface. Change the model identifier to switch between them without any other integration changes.

  • How do I authenticate with GLM 5 Turbo through AI Gateway?

    AI Gateway provides a unified API key. No separate Z.ai account is needed. Use the model identifier to route requests. BYOK is also supported for direct provider access.

  • Is GLM 5 Turbo good for agentic coding?

    Yes. It inherits GLM-5's improvements in autonomous tool use, code planning, and multi-step iteration. The faster inference makes it practical for agent loops where speed compounds across many steps.

  • What is the pricing for GLM 5 Turbo?

    See the pricing section on this page for today's rates. AI Gateway exposes each provider's pricing for GLM 5 Turbo.