Skip to content

GLM 4.7 FlashX

GLM 4.7 FlashX is the ultra-fast inference variant in Z.ai's GLM-4.7 generation, released January 1, 2025. Designed for the lowest latency workloads, it provides the fastest response times in the GLM-4.7 family while retaining core coding and reasoning capabilities.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'zai/glm-4.7-flashx',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • How fast is GLM 4.7 FlashX compared to other GLM-4.7 variants?

    GLM 4.7 FlashX is the fastest inference tier in the GLM-4.7 generation. It provides the lowest latency, followed by GLM-4.7-Flash, then the full GLM-4.7.

  • What capability tradeoffs does GLM 4.7 FlashX make?

    It trades peak reasoning and coding depth for speed. Core capabilities are retained, but the most complex multi-step reasoning and code generation tasks will produce better results on GLM-4.7 or GLM-4.7-Flash.

  • Can I mix GLM 4.7 FlashX with other GLM-4.7 models?

    Yes. All GLM-4.7 variants share the same API surface. Route simple requests to GLM 4.7 FlashX for speed and complex ones to GLM-4.7 for quality.

  • What is the context window for GLM 4.7 FlashX?

    200K tokens.

  • How do I authenticate with GLM 4.7 FlashX through AI Gateway?

    AI Gateway provides a unified API key. No separate Z.ai account is needed. Use the model identifier to route requests. BYOK is also supported.

  • What workloads is GLM 4.7 FlashX best for?

    Real-time user-facing applications, high-frequency API calls, simple classification and extraction, and any workload where response latency is the primary constraint.

  • How does pricing compare to other GLM-4.7 variants?

    Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves GLM 4.7 FlashX.