Skip to content

GLM 4.5 Air

GLM 4.5 Air is Z.ai's efficiency-focused model released July 28, 2025. It delivers fast inference for high-volume workloads while keeping reasoning and coding capability at reduced cost compared to the full GLM-4.5.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'zai/glm-4.5-air',
prompt: 'Why is the sky blue?'
})

Playground

Try out GLM 4.5 Air by Z.ai. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About GLM 4.5 Air

GLM 4.5 Air was released July 28, 2025 as the efficiency-optimized variant in Z.ai's GLM-4.5 generation. Where GLM-4.5 targets maximum capability, GLM 4.5 Air trades a degree of depth for faster inference and lower per-token cost, making it practical for high-throughput production pipelines.

The model retains the core reasoning, coding, and agentic capabilities of the GLM-4.5 family while operating at reduced computational overhead. This positions it for workloads where response latency and cost per request are primary constraints: classification, extraction, summarization, and conversational applications that process high volumes of requests.

GLM 4.5 Air supports the same context window of 128K tokens as the full GLM-4.5 model. Through AI Gateway, it benefits from unified API access, built-in observability, and intelligent provider routing with automatic retries.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Z.ai
Legal:Terms
Privacy
128K
0.8s
63tps
$0.20/M$1.10/M
Read:$0.03/M
Write:
07/28/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Z.ai

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
205K
0.8s
44tps
$1.40/M$4.40/M
Read:$0.26/M
Write:
deepinfra logo
fireworks logo
novita logo
+1
04/07/2026
200K
0.9s
111tps
$1.20/M$4.00/M
Read:$0.24/M
Write:
zai logo
04/01/2026
203K
0.9s
107tps
$1.20/M$4.00/M
Read:$0.24/M
Write:
zai logo
03/15/2026
203K
0.5s
128tps
$0.80/M$2.56/M
Read:$0.16/M
Write:
bedrock logo
deepinfra logo
fireworks logo
+3
02/12/2026
205K
0.1s
601tps
$2.25/M$2.75/M
Read:$2.25/M
Write:
bedrock logo
cerebras logo
deepinfra logo
+2
12/22/2025
200K
0.2s
$0.07/M$0.40/M
Read:$0.01/M
Write:
bedrock logo
zai logo

What To Consider When Choosing a Provider

  • Configuration: GLM 4.5 Air is optimized for speed. For tasks requiring deep multi-step reasoning, the full GLM-4.5 or later models like GLM-5 may produce better results.
  • Configuration: At $0.2 input and $1.1 output per million tokens, GLM 4.5 Air is designed for workloads where unit economics matter. Estimate your monthly token volume to compare total cost against heavier alternatives.
  • Configuration: GLM 4.5 Air uses the same API interface as GLM-4.5, so switching between the two requires only changing the model identifier.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GLM 4.5 Air

Best For

  • High-volume production pipelines: Low latency and cost efficiency per request outweigh peak reasoning depth
  • Classification and extraction tasks: Competent language understanding without extended chain-of-thought overhead
  • Conversational applications: Many concurrent users where response speed directly affects user experience
  • Summarization workflows: Large document sets where throughput determines pipeline feasibility
  • Development and prototyping: Fast iteration cycles benefit from quick model responses

Consider Alternatives When

  • Deep reasoning needed: The full GLM-4.5 or GLM-5 provides deeper deliberation capabilities for complex multi-step planning
  • Vision or image understanding: GLM-4.5V builds on GLM-4.5-Air with multimodal input support
  • Advanced code generation: GLM-4.6 and GLM-4.7 include targeted coding improvements
  • Lowest cost simple tasks: Evaluate flash-tier models in the GLM lineup when further capability tradeoffs are acceptable

Conclusion

GLM 4.5 Air fills the efficiency tier in Z.ai's GLM-4.5 generation. It offers the practical balance teams need when deploying language models at scale: broad general capability with the speed and cost profile that high-volume production demands.

Frequently Asked Questions

  • How does GLM 4.5 Air compare to the full GLM-4.5?

    GLM 4.5 Air is the lightweight variant optimized for faster inference and lower cost. GLM-4.5 provides deeper reasoning capability at higher per-token cost. Both share the same API surface and context window.

  • What is GLM 4.5 Air best suited for?

    High-volume tasks where speed and cost matter: classification, extraction, summarization, and conversational applications. For deep reasoning tasks, consider the full GLM-4.5 or GLM-5.

  • What is the context window for GLM 4.5 Air?

    128K tokens, matching the full GLM-4.5 model.

  • How do I switch between GLM 4.5 Air and GLM-4.5?

    Change the model identifier in your API call. Both models share the same API interface, so no other integration changes are needed.

  • How do I authenticate with GLM 4.5 Air through AI Gateway?

    AI Gateway provides a unified API key. Configure it in your environment and use the model identifier to route requests. No separate Z.ai account is required, though BYOK is supported.

  • Is GLM 4.5 Air suitable for agentic workflows?

    Yes, for agent steps that prioritize speed over deep reasoning. For planning-heavy steps, route those to GLM-4.5 or GLM-5 while using GLM 4.5 Air for faster execution steps.

  • What is the pricing for GLM 4.5 Air?

    Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves GLM 4.5 Air.