Skip to content

Grok 4 Fast Non-Reasoning

Grok 4 Fast Non-Reasoning is the speed-optimized, non-reasoning variant of xAI's Grok 4 Fast. It delivers fast inference without chain-of-thought overhead, tailored for high-throughput applications within a context window of 2M tokens.

Tool UseImplicit Cachingtiered-cost
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'xai/grok-4-fast-non-reasoning',
prompt: 'Why is the sky blue?'
})

Playground

Try out Grok 4 Fast Non-Reasoning by xAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
xAI
Legal:Terms
Privacy
2M
$0.20/M
$0.50/M
Read:
$0.05/M
Write:
09/19/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by xAI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
1.2s
160tps
$1.25/M
$2.50/M
Read:
$0.2/M
Write:
$5/K
+ input costs
xai logo
04/30/2026
2M
3.9s
911tps
$1.25/M
$2.50/M
Read:
$0.2/M
Write:
$5/K
+ input costs
xai logo
03/11/2026
2M
0.8s
118tps
$1.25/M
$2.50/M
Read:
$0.2/M
Write:
$5/K
+ input costs
vertex logo
xai logo
03/09/2026
2M
3.9s
775tps
$1.25/M
$2.50/M
Read:
$0.2/M
Write:
$5/K
+ input costs
xai logo
03/09/2026
1M
0.2s
91tps
$0.20/M$0.50/M
Read:$0.05/M
Write:
vertex logo
07/09/2025
1M
4.4s
249tps
$0.20/M$0.50/M
Read:$0.05/M
Write:
vertex logo
07/09/2025

About Grok 4 Fast Non-Reasoning

Grok 4 Fast Non-Reasoning is the non-reasoning configuration of xAI's Grok 4 Fast model, released September 19, 2025. It disables the extended chain-of-thought reasoning process, producing direct answers without intermediate reasoning traces. This eliminates reasoning token overhead and reduces both latency and output cost per request.

The model builds on the Grok 4 training foundation, carrying forward its language understanding and instruction following capabilities, but operates in a direct-response mode optimized for speed. With a context window of 2M tokens, it handles general-purpose tasks including text generation, summarization, classification, and tool calling.

Grok 4 Fast Non-Reasoning is available at $0.2 per million input tokens and $0.5 per million output tokens through Vercel AI Gateway. It pairs naturally with its reasoning counterpart: use the non-reasoning variant for straightforward tasks and the reasoning variant when analytical depth is needed.

What To Consider When Choosing a Provider

  • Configuration: This variant produces direct answers without chain-of-thought output. If you need to inspect the model's reasoning process or require multi-step analytical depth, use the reasoning variant instead.
  • Configuration: Without reasoning overhead, Grok 4 Fast Non-Reasoning delivers higher tokens-per-second throughput. This is ideal for streaming applications and high-volume pipelines.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Grok 4 Fast Non-Reasoning

Best For

  • High-throughput production APIs: Direct answers at low latency serve end users best
  • Chat and conversational interfaces: Users expect fast, natural responses without verbose reasoning
  • Text generation and content creation: Drafting, editing, and rephrasing tasks where throughput matters more than deep reasoning
  • Classification and routing pipelines: That categorize inputs quickly before downstream processing
  • Tool-calling agentic workflows: The model needs to decide and act quickly rather than deliberate

Consider Alternatives When

  • Complex analytical tasks: Requiring multi-step reasoning. Use the Grok 4 Fast Reasoning variant or the full Grok 4
  • Competition-level math or science: Chain-of-thought produces measurably better accuracy
  • Tasks where showing reasoning builds trust: Such as medical or legal analysis. The reasoning variant exposes its thinking
  • Maximum cost efficiency on simple tasks: Grok 3 Mini Fast offers even lower per-token costs

Conclusion

Grok 4 Fast Non-Reasoning strips away reasoning overhead to deliver the Grok 4 foundation at maximum speed. Use it for production workloads that need direct answers without chain-of-thought latency or token cost. Pair it with the reasoning variant for a two-tier architecture that matches model capability to task complexity.

Frequently Asked Questions

  • What does 'non-reasoning' mean for Grok 4 Fast Non-Reasoning?

    The model produces direct answers without generating chain-of-thought reasoning traces. This reduces latency and output token consumption compared to the reasoning variant.

  • How does Grok 4 Fast Non-Reasoning differ from Grok 4 Fast Reasoning?

    Both share the same Grok 4 Fast foundation. The reasoning variant generates chain-of-thought traces for analytical tasks, while Grok 4 Fast Non-Reasoning produces direct responses optimized for speed.

  • What is the context window?

    2M tokens.

  • What does Grok 4 Fast Non-Reasoning cost?

    Rates are listed on this page. They reflect the providers routing through AI Gateway and shift when providers update their pricing.

  • How do I authenticate with Grok 4 Fast Non-Reasoning through Vercel AI Gateway?

    Use your Vercel AI Gateway API key with xai/grok-4-fast-non-reasoning as the model identifier. No separate xAI account is required for gateway-managed access.

  • Can Grok 4 Fast Non-Reasoning call tools and functions?

    Yes. Grok 4 Fast Non-Reasoning supports tool calling and function invocation, making it suitable for agentic workflows that need fast decision-making.

  • Does Vercel AI Gateway support Zero Data Retention for Grok 4 Fast Non-Reasoning?

    Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.