Claude Opus 4
Claude Opus 4 is a coding model from Anthropic with strong benchmark scores, including 72.5% on SWE-bench Verified and 43.2% on Terminal-bench, with sustained performance on multi-hour agentic tasks and hybrid extended thinking with tool use.
import { streamText } from 'ai'
const result = streamText({ model: 'anthropic/claude-opus-4', prompt: 'Why is the sky blue?'})Frequently Asked Questions
What SWE-bench and Terminal-bench scores did Claude Opus 4 achieve?
Opus 4 scored 72.5% on SWE-bench Verified and 43.2% on Terminal-bench, both without extended thinking.
How long can Claude Opus 4 run an agentic task without losing coherence?
Rakuten validated a seven-hour independent run on a demanding open-source refactoring task with sustained performance. Anthropic described the model as capable of working continuously for several hours.
What is extended thinking with tool use in Claude Opus 4?
A beta capability introduced with the Claude 4 launch. The model alternates between extended reasoning and tool calls within a single session. For example, it can think about a problem, run a web search, reason about the results, search again, and synthesize across the chain.
How did Claude Opus 4 improve memory capabilities?
When you provide local file access, Opus 4 creates and maintains memory files to store key facts and context. This enables better long-term coherence on extended tasks. Anthropic illustrated this with the model creating a navigation guide during autonomous Pokémon gameplay.
What was the shortcut-taking behavior reduction?
Claude 4 models (Opus 4 and Sonnet 4) are 65% less likely to use shortcuts or loopholes to complete agentic tasks compared to Sonnet 3.7. This is a reliability improvement for production deployments where you need the model to solve the actual problem rather than gaming the metric.
How does Opus 4 pricing compare to Sonnet 4?
Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves Claude Opus 4.
Does Claude Opus 4 support thinking summaries?
Yes. A smaller model condenses lengthy thought processes into summaries. Anthropic noted this is only needed about 5% of the time, when thoughts are too long to display in full.