Skip to content

GLM 4.7 Flash

GLM 4.7 Flash is the speed-optimized variant in Z.ai's GLM-4.7 generation, released N/A. It delivers faster inference for high-throughput workloads while retaining the coding, tool usage, and conversational improvements introduced in GLM-4.7.

ReasoningTool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'zai/glm-4.7-flash',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • How does GLM 4.7 Flash compare to the full GLM-4.7?

    GLM 4.7 Flash shares the same foundational improvements (coding, tool usage, multi-step reasoning, natural tone) but is optimized for faster inference at lower cost. Peak capability on complex tasks will be lower than GLM-4.7.

  • What is the difference between GLM 4.7 Flash and GLM-4.7-FlashX?

    GLM 4.7 Flash provides more capability with moderate speed optimization. GLM-4.7-FlashX is the fastest tier in the generation, trading more capability for the lowest possible latency.

  • Can I switch between GLM-4.7 variants easily?

    Yes. All variants share the same API surface. Change the model identifier to switch between GLM-4.7, GLM-4.7-Flash, and GLM-4.7-FlashX.

  • What is the context window for GLM 4.7 Flash?

    200K tokens.

  • How do I authenticate with GLM 4.7 Flash through AI Gateway?

    AI Gateway provides a unified API key. No separate Z.ai account is needed. Use the model identifier to route requests. BYOK is supported for direct provider accounts.

  • Is GLM 4.7 Flash suitable for frontend development?

    Yes. It inherits the frontend development improvements from GLM-4.7, though the full GLM-4.7 may produce slightly better results on complex UI generation tasks.

  • What is the pricing for GLM 4.7 Flash?

    See the pricing section on this page for today's rates. AI Gateway exposes each provider's pricing for GLM 4.7 Flash.