Skip to content

Claude Opus 4

Claude Opus 4 is a coding model from Anthropic with strong benchmark scores, including 72.5% on SWE-bench Verified and 43.2% on Terminal-bench, with sustained performance on multi-hour agentic tasks and hybrid extended thinking with tool use.

File InputReasoningTool UseVision (Image)Explicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'anthropic/claude-opus-4',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • What SWE-bench and Terminal-bench scores did Claude Opus 4 achieve?

    Opus 4 scored 72.5% on SWE-bench Verified and 43.2% on Terminal-bench, both without extended thinking.

  • How long can Claude Opus 4 run an agentic task without losing coherence?

    Rakuten validated a seven-hour independent run on a demanding open-source refactoring task with sustained performance. Anthropic described the model as capable of working continuously for several hours.

  • What is extended thinking with tool use in Claude Opus 4?

    A beta capability introduced with the Claude 4 launch. The model alternates between extended reasoning and tool calls within a single session. For example, it can think about a problem, run a web search, reason about the results, search again, and synthesize across the chain.

  • How did Claude Opus 4 improve memory capabilities?

    When you provide local file access, Opus 4 creates and maintains memory files to store key facts and context. This enables better long-term coherence on extended tasks. Anthropic illustrated this with the model creating a navigation guide during autonomous Pokémon gameplay.

  • What was the shortcut-taking behavior reduction?

    Claude 4 models (Opus 4 and Sonnet 4) are 65% less likely to use shortcuts or loopholes to complete agentic tasks compared to Sonnet 3.7. This is a reliability improvement for production deployments where you need the model to solve the actual problem rather than gaming the metric.

  • How does Opus 4 pricing compare to Sonnet 4?

    Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves Claude Opus 4.

  • Does Claude Opus 4 support thinking summaries?

    Yes. A smaller model condenses lengthy thought processes into summaries. Anthropic noted this is only needed about 5% of the time, when thoughts are too long to display in full.