Claude 3 Haiku
Claude 3 Haiku handles enterprise document workloads at a fraction of Opus-tier cost, serving as the speed-and-affordability anchor of the Claude 3 family.
import { streamText } from 'ai'
const result = streamText({ model: 'anthropic/claude-3-haiku', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
At $0.25 per million input tokens and $1.25 per million output tokens, Haiku keeps high-volume annotation and chat pipelines economical. Use AI Gateway's per-request cost tracking to compare actual spend against Sonnet or newer Haiku generations.
When to Use Claude 3 Haiku
Best For
High-volume document analysis:
Processing contracts, filings, and legal cases where throughput drives pipeline feasibility
Customer support chat:
Requiring fast response times and consistent instruction following across long conversation histories
Image and chart annotation at scale:
Using shared Claude 3 vision capabilities at the lowest per-image cost in the family
Data extraction and labeling pipelines:
Cost per token is the binding constraint and mid-tier reasoning suffices
Enterprise content moderation:
Screening large queues of text and images with fast turnaround
Consider Alternatives When
Deep multi-step reasoning:
Sonnet or Opus handles complex code generation and deep reasoning better
Extended thinking or computer use:
Those capabilities arrived in later model generations
Long-context prompts:
Haiku's throughput drops on prompts exceeding 32K tokens
Claude 3.5-era improvements:
Claude 3.5 Haiku matches Opus-level benchmarks at comparable speed
Conclusion
Claude 3 Haiku occupies the speed-and-cost floor of the Claude 3 generation. It remains a practical choice for teams running high-volume, latency-sensitive pipelines where the task complexity fits within a fast-tier model's capability range. Later Haiku generations raised the capability ceiling significantly, but the original Claude 3 Haiku still serves workloads optimized for raw throughput.
FAQ
Anthropic described Claude 3 Haiku as three times faster than peer models in its performance tier. Sonnet and Opus are slower, with Opus delivering speeds comparable to Claude 2.
Prompts exceeding 32K tokens reduce throughput meaningfully. Factor this slowdown into latency estimates if your workload regularly involves long-context inputs.
Yes. Haiku shares the same vision architecture as Sonnet and Opus. It processes photos, charts, graphs, and technical diagrams. Anthropic highlighted enterprise document analysis and large-scale image annotation as primary vision use cases.
Claude 3.5 Haiku matched Claude 3 Opus on many intelligence benchmarks while maintaining Haiku-class speed. The original Claude 3 Haiku is faster on a per-token basis but operates at a lower capability tier. Choose based on whether throughput or reasoning depth matters more.
Exact per-image cost depends on image resolution and token count. Check the pricing panel on this page for current rates.
Configure your Anthropic API key in your AI Gateway project settings. AI Gateway routes requests to anthropic, bedrock, vertexAnthropic and handles authentication, retries, and failover. Use the identifier anthropic/claude-3-haiku in your API calls.
Haiku handles structured tool calling and simple multi-step tasks. For complex agentic workflows that require deep planning, later Sonnet or Haiku generations offer stronger instruction following and tool use accuracy.