Skip to content

Grok 2 Vision

Grok 2 vision model excels in vision-based tasks, delivering state-of-the-art performance in visual math reasoning (MathVista) and document-based question answering (DocVQA). It can process a wide variety of visual information including documents, diagrams, charts, screenshots, and photographs.

Tool UseVision (Image)
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'xai/grok-2-vision',
prompt: 'Why is the sky blue?'
})

More models by xAI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
1.2s
79tps
$1.25/M
$2.50/M
Read:
$0.2/M
Write:
$5/K
+ input costs
xai logo
04/30/2026
2M
4.8s
571tps
$1.25/M
$2.50/M
Read:
$0.2/M
Write:
$5/K
+ input costs
xai logo
03/11/2026
2M
0.7s
65tps
$0.20/M
$0.50/M
Read:
$0.05/M
Write:
$5/K
+ input costs
xai logo
09/19/2025
256K
0.4s
93tps
$0.20/M$1.50/M
Read:$0.02/M
Write:
xai logo
08/28/2025
2M
0.2s
189tps
$0.20/M
$0.50/M
Read:
$0.05/M
Write:
$5/K
+ input costs
vertex logo
xai logo
07/09/2025
2M
0.7s
342tps
$0.20/M
$0.50/M
Read:
$0.05/M
Write:
$5/K
+ input costs
vertex logo
xai logo
07/09/2025