GLM 4.5 Air
GLM 4.5 Air is Z.ai's efficiency-focused model released July 28, 2025. It delivers fast inference for high-volume workloads while keeping reasoning and coding capability at reduced cost compared to the full GLM-4.5.
import { streamText } from 'ai'
const result = streamText({ model: 'zai/glm-4.5-air', prompt: 'Why is the sky blue?'})Playground
Try out GLM 4.5 Air by Z.ai. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
About GLM 4.5 Air
GLM 4.5 Air was released July 28, 2025 as the efficiency-optimized variant in Z.ai's GLM-4.5 generation. Where GLM-4.5 targets maximum capability, GLM 4.5 Air trades a degree of depth for faster inference and lower per-token cost, making it practical for high-throughput production pipelines.
The model retains the core reasoning, coding, and agentic capabilities of the GLM-4.5 family while operating at reduced computational overhead. This positions it for workloads where response latency and cost per request are primary constraints: classification, extraction, summarization, and conversational applications that process high volumes of requests.
GLM 4.5 Air supports the same context window of 128K tokens as the full GLM-4.5 model. Through AI Gateway, it benefits from unified API access, built-in observability, and intelligent provider routing with automatic retries.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Z.ai
| Model |
|---|
What To Consider When Choosing a Provider
- Configuration: GLM 4.5 Air is optimized for speed. For tasks requiring deep multi-step reasoning, the full GLM-4.5 or later models like GLM-5 may produce better results.
- Configuration: At $0.2 input and $1.1 output per million tokens, GLM 4.5 Air is designed for workloads where unit economics matter. Estimate your monthly token volume to compare total cost against heavier alternatives.
- Configuration: GLM 4.5 Air uses the same API interface as GLM-4.5, so switching between the two requires only changing the model identifier.
- Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use GLM 4.5 Air
Best For
- High-volume production pipelines: Low latency and cost efficiency per request outweigh peak reasoning depth
- Classification and extraction tasks: Competent language understanding without extended chain-of-thought overhead
- Conversational applications: Many concurrent users where response speed directly affects user experience
- Summarization workflows: Large document sets where throughput determines pipeline feasibility
- Development and prototyping: Fast iteration cycles benefit from quick model responses
Consider Alternatives When
- Deep reasoning needed: The full GLM-4.5 or GLM-5 provides deeper deliberation capabilities for complex multi-step planning
- Vision or image understanding: GLM-4.5V builds on GLM-4.5-Air with multimodal input support
- Advanced code generation: GLM-4.6 and GLM-4.7 include targeted coding improvements
- Lowest cost simple tasks: Evaluate flash-tier models in the GLM lineup when further capability tradeoffs are acceptable
Conclusion
GLM 4.5 Air fills the efficiency tier in Z.ai's GLM-4.5 generation. It offers the practical balance teams need when deploying language models at scale: broad general capability with the speed and cost profile that high-volume production demands.
Frequently Asked Questions
How does GLM 4.5 Air compare to the full GLM-4.5?
GLM 4.5 Air is the lightweight variant optimized for faster inference and lower cost. GLM-4.5 provides deeper reasoning capability at higher per-token cost. Both share the same API surface and context window.
What is GLM 4.5 Air best suited for?
High-volume tasks where speed and cost matter: classification, extraction, summarization, and conversational applications. For deep reasoning tasks, consider the full GLM-4.5 or GLM-5.
What is the context window for GLM 4.5 Air?
128K tokens, matching the full GLM-4.5 model.
How do I switch between GLM 4.5 Air and GLM-4.5?
Change the model identifier in your API call. Both models share the same API interface, so no other integration changes are needed.
How do I authenticate with GLM 4.5 Air through AI Gateway?
AI Gateway provides a unified API key. Configure it in your environment and use the model identifier to route requests. No separate Z.ai account is required, though BYOK is supported.
Is GLM 4.5 Air suitable for agentic workflows?
Yes, for agent steps that prioritize speed over deep reasoning. For planning-heavy steps, route those to GLM-4.5 or GLM-5 while using GLM 4.5 Air for faster execution steps.
What is the pricing for GLM 4.5 Air?
Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves GLM 4.5 Air.