GLM 4.5 Air was released July 28, 2025 as the efficiency-optimized variant in Z.ai's GLM-4.5 generation. Where GLM-4.5 targets maximum capability, GLM 4.5 Air trades a degree of depth for faster inference and lower per-token cost, making it practical for high-throughput production pipelines.
The model retains the core reasoning, coding, and agentic capabilities of the GLM-4.5 family while operating at reduced computational overhead. This positions it for workloads where response latency and cost per request are primary constraints: classification, extraction, summarization, and conversational applications that process high volumes of requests.
GLM 4.5 Air supports the same context window of 128K tokens as the full GLM-4.5 model. Through AI Gateway, it benefits from unified API access, built-in observability, and intelligent provider routing with automatic retries.