Skip to content

GLM 4.7 Flash

GLM 4.7 Flash is the speed-optimized variant in Z.ai's GLM-4.7 generation, released N/A. It delivers faster inference for high-throughput workloads while retaining the coding, tool usage, and conversational improvements introduced in GLM-4.7.

ReasoningTool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'zai/glm-4.7-flash',
prompt: 'Why is the sky blue?'
})

Playground

Try out GLM 4.7 Flash by Z.ai. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Z.ai
Legal:Terms
Privacy
200K
$0.07/M$0.40/M
Read:$0.01/M
Write:
Amazon Bedrock
Legal:Terms
Privacy
200K
0.1s
$0.07/M$0.40/M
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Z.ai

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
205K
0.9s
60tps
$1.40/M$4.40/M
Read:$0.26/M
Write:
deepinfra logo
fireworks logo
novita logo
+1
04/07/2026
200K
0.9s
61tps
$1.20/M$4.00/M
Read:$0.24/M
Write:
zai logo
04/01/2026
203K
0.9s
86tps
$1.20/M$4.00/M
Read:$0.24/M
Write:
zai logo
03/15/2026
203K
0.4s
92tps
$0.80/M$2.56/M
Read:$0.16/M
Write:
bedrock logo
deepinfra logo
fireworks logo
+3
02/12/2026
205K
0.1s
961tps
$2.25/M$2.75/M
Read:$2.25/M
Write:
bedrock logo
cerebras logo
deepinfra logo
+2
12/22/2025
205K
0.4s
127tps
$0.60/M$2.20/M
Read:$0.11/M
Write:
baseten logo
deepinfra logo
novita logo
+1
09/30/2025

About GLM 4.7 Flash

GLM 4.7 Flash was released N/A as the middle tier in Z.ai's GLM-4.7 generation, sitting between the full GLM-4.7 and the ultra-fast GLM-4.7-FlashX. It inherits the 4.7 generation's gains in coding assistance, tool usage, multi-step reasoning, and natural conversational tone while trading peak capability for faster inference.

The GLM-4.7 generation focused on closing coding and tool-use gaps with competing models. GLM 4.7 Flash carries those gains forward at a cost-and-latency profile that fits high-volume coding assistance, real-time chat, and production pipelines with strict response time budgets. If the full GLM-4.7 is too slow and GLM-4.7-FlashX strips too much capability, GLM 4.7 Flash is the compromise.

Through AI Gateway, switching between GLM-4.7 tiers requires only changing the model identifier. The API surface and request format stay the same.

What To Consider When Choosing a Provider

  • Configuration: GLM 4.7 Flash sits in the middle of the 4.7 generation. Test it against both GLM-4.7 (higher capability) and GLM-4.7-FlashX (higher speed) on your specific tasks to find the right tradeoff.
  • Configuration: All GLM-4.7 variants share the same API. You can A/B test across tiers without changing your integration.
  • Configuration: The reduced per-token cost makes GLM 4.7 Flash practical for high-volume deployments where GLM-4.7's per-request cost would be prohibitive.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GLM 4.7 Flash

Best For

  • High-volume coding assistance: Fast response times improve developer productivity across many concurrent sessions
  • Real-time conversational applications: The 4.7 generation's natural tone under strict latency thresholds
  • Production API backends: High request volumes where cost per token directly impacts margins
  • Agentic pipelines: Most steps need good capability at speed, with the option to route complex steps to the full GLM-4.7
  • Interactive prototyping and development: Fast iteration cycles depend on quick model responses

Consider Alternatives When

  • Maximum complex-task capability: The full GLM-4.7 provides the deepest reasoning and coding quality
  • Absolute fastest inference: GLM-4.7-FlashX offers the lowest latency in the 4.7 generation
  • Vision capabilities needed: Evaluate GLM-4.6V or GLM-4.5V for multimodal input
  • Advanced reasoning modes: GLM-5 provides multiple thinking modes and an expanded reasoning architecture

Conclusion

GLM 4.7 Flash sits between the full GLM-4.7 and GLM-4.7-FlashX: fast enough for many production latency budgets, and more capable than FlashX on heavier coding and tool-use tasks. Switch between 4.7 tiers through AI Gateway by changing the model identifier.

Frequently Asked Questions

  • How does GLM 4.7 Flash compare to the full GLM-4.7?

    GLM 4.7 Flash shares the same foundational improvements (coding, tool usage, multi-step reasoning, natural tone) but is optimized for faster inference at lower cost. Peak capability on complex tasks will be lower than GLM-4.7.

  • What is the difference between GLM 4.7 Flash and GLM-4.7-FlashX?

    GLM 4.7 Flash provides more capability with moderate speed optimization. GLM-4.7-FlashX is the fastest tier in the generation, trading more capability for the lowest possible latency.

  • Can I switch between GLM-4.7 variants easily?

    Yes. All variants share the same API surface. Change the model identifier to switch between GLM-4.7, GLM-4.7-Flash, and GLM-4.7-FlashX.

  • What is the context window for GLM 4.7 Flash?

    200K tokens.

  • How do I authenticate with GLM 4.7 Flash through AI Gateway?

    AI Gateway provides a unified API key. No separate Z.ai account is needed. Use the model identifier to route requests. BYOK is supported for direct provider accounts.

  • Is GLM 4.7 Flash suitable for frontend development?

    Yes. It inherits the frontend development improvements from GLM-4.7, though the full GLM-4.7 may produce slightly better results on complex UI generation tasks.

  • What is the pricing for GLM 4.7 Flash?

    See the pricing section on this page for today's rates. AI Gateway exposes each provider's pricing for GLM 4.7 Flash.