Skip to content

Grok 3 Mini Fast Beta

Grok 3 Mini Fast Beta is the fastest and most cost-efficient model in xAI's Grok 3 family. It combines the compact Grok 3 Mini architecture with speed optimization for maximum throughput at the lowest per-token cost.

Tool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'xai/grok-3-mini-fast',
prompt: 'Why is the sky blue?'
})

Playground

Try out Grok 3 Mini Fast Beta by xAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About Grok 3 Mini Fast Beta

Grok 3 Mini Fast Beta is the speed-optimized configuration of the Grok 3 Mini model, released February 17, 2025. It represents the most cost-efficient option in the Grok 3 lineup, combining the compact architecture of Grok 3 Mini with additional inference optimization for maximum tokens-per-second throughput.

With a context window of 131.1K tokens, the model handles standard language tasks including classification, extraction, summarization, and basic code generation. It's engineered for workloads where volume and speed dominate the requirements, rather than deep reasoning or analytical depth.

Grok 3 Mini Fast Beta is available at $0.6 per million input tokens and $4.0 per million output tokens through Vercel AI Gateway. At this price point, it enables use cases that would be cost-prohibitive with larger models, such as processing millions of records per day or powering high-traffic consumer applications.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
xAI
Legal:Terms
Privacy
131K
0.3s
122tps
$0.60/M$4.00/M
02/17/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by xAI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
0.8s
82tps
$1.25/M
$2.50/M
Read:
$0.2/M
Write:
$5/K
+ input costs
xai logo
04/30/2026
2M
0.5s
165tps
$1.25/M
$2.50/M
Read:
$0.2/M
Write:
$5/K
+ input costs
xai logo
03/11/2026
2M
0.6s
49tps
$0.20/M
$0.50/M
Read:
$0.05/M
Write:
$5/K
+ input costs
xai logo
09/19/2025
256K
0.3s
119tps
$0.20/M$1.50/M
Read:$0.02/M
Write:
xai logo
08/28/2025
2M
0.2s
220tps
$0.20/M
$0.50/M
Read:
$0.05/M
Write:
$5/K
+ input costs
vertex logo
xai logo
07/09/2025
2M
0.7s
236tps
$0.20/M
$0.50/M
Read:
$0.05/M
Write:
$5/K
+ input costs
vertex logo
xai logo
07/09/2025

What To Consider When Choosing a Provider

  • Configuration: At $0.6 input and $4.0 output per million tokens, Grok 3 Mini Fast Beta enables workloads at scale that would be impractical with larger Grok models. Model the total cost for your expected volume before selecting a model tier.
  • Configuration: Run representative samples through Grok 3 Mini Fast Beta to confirm it meets your quality threshold. For many classification and extraction tasks, it performs comparably to larger variants.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Grok 3 Mini Fast Beta

Best For

  • High-traffic consumer applications: Millions of requests per day make per-token cost the dominant factor
  • Text classification and routing: Pipelines that categorize inputs before dispatching to specialized models or workflows
  • Data extraction at scale: Entity recognition, attribute extraction, and structured output from unstructured text
  • Autocomplete and suggestion systems: Sub-second latency is essential for user experience
  • Preprocessing pipelines: That filter, summarize, or transform data before more expensive downstream processing

Consider Alternatives When

  • Reasoning-intensive tasks: Quality suffers noticeably. Step up to Grok 3 Mini, Grok 3 Fast, or the full Grok 3
  • Complex code generation: Requiring multi-step planning and debugging. Grok 3 Fast or Grok Code Fast 1 handle these better
  • Long-form analytical writing: Depth and nuance are expected. Larger Grok models produce noticeably better results
  • Tasks where errors are costly: For legal or medical applications, use a more capable model and validate outputs

Conclusion

Grok 3 Mini Fast Beta maximizes throughput and minimizes cost within the Grok 3 family. It's the right choice when your workload is defined by volume and speed rather than reasoning complexity. For teams building high-scale pipelines or consumer-facing features on a budget, it provides the most efficient entry point into xAI's model ecosystem.

Frequently Asked Questions

  • What is Grok 3 Mini Fast Beta optimized for?

    Maximum inference speed and minimum cost per token. It combines the compact Grok 3 Mini architecture with additional speed optimization for the highest throughput in the Grok 3 family.

  • How does Grok 3 Mini Fast Beta compare to Grok 3 Mini?

    Grok 3 Mini Fast Beta adds latency optimization on top of the Grok 3 Mini architecture. It's faster but may trade marginal quality on complex reasoning tasks.

  • What is the context window?

    131.1K tokens.

  • What does Grok 3 Mini Fast Beta cost?

    Rates are listed on this page. They reflect the providers routing through AI Gateway and shift when providers update their pricing.

  • How do I authenticate with Grok 3 Mini Fast Beta through Vercel AI Gateway?

    Use your Vercel AI Gateway API key with xai/grok-3-mini-fast as the model identifier. No separate xAI account is needed for gateway-managed access.

  • Can Grok 3 Mini Fast Beta handle code generation?

    It handles basic code generation, boilerplate, and simple transformations. For complex coding tasks, Grok 3 Fast or Grok Code Fast 1 are better suited.

  • Does Vercel AI Gateway support Zero Data Retention for Grok 3 Mini Fast Beta?

    Zero Data Retention is not currently available for this model. ZDR on AI Gateway applies to direct gateway requests; BYOK flows aren't covered. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.