Claude Sonnet 4.5
Claude Sonnet 4.5 is a coding model from Anthropic with strong benchmark scores, including 77.2% on SWE-bench Verified and 61.4% on OSWorld for computer use, sustaining 30+ hour agentic coding sessions, and delivering substantial gains across coding, reasoning, math, and domain-specific expertise.
import { streamText } from 'ai'
const result = streamText({ model: 'anthropic/claude-sonnet-4.5', prompt: 'Why is the sky blue?'})Frequently Asked Questions
What was Claude Sonnet 4.5's OSWorld score and why does it matter?
61.4%, up from Sonnet 4's 42.2% four months earlier. OSWorld measures AI performance on real-world computer tasks: navigating software, filling forms, and clicking UI elements. It focuses on operational computer-use scenarios rather than abstract reasoning alone.
How long can Claude Sonnet 4.5 maintain focus on a single agentic coding task?
More than 30 hours on complex, multi-step tasks. Anthropic noted this duration changes what's architecturally feasible for autonomous engineering work. Individual results vary by task structure.
What is ASL-3 and why does it apply to Sonnet 4.5?
ASL-3 (AI Safety Level 3) is Anthropic's framework level for models requiring additional safeguards. Sonnet 4.5 is the first Claude model released under ASL-3 protections, which include classifiers screening inputs and outputs for CBRN-related content. False positive rates have decreased by a factor of 10 since initial deployment.
What is the Claude Agent SDK and how does it relate to this model?
The Claude Agent SDK launched alongside Sonnet 4.5. It gives you access to the same agent infrastructure that powers Claude Code: memory management across long tasks, permission systems, and subagent coordination. Use it to build custom agents on the same foundation.
What alignment improvements came with Sonnet 4.5?
Substantial reductions in sycophancy, deception, power-seeking, encouragement of delusional thinking, and compliance with harmful system prompts, measured via an automated behavioral auditor. The model also improved defenses against prompt injection attacks for computer use and agentic capabilities.
Why did specialists in finance, law, medicine, and STEM find Sonnet 4.5 significantly better than previous models?
Professionals assessed domain-specific knowledge and reasoning in Anthropic's expert evaluations. Results showed substantially better performance compared to older models, including Opus 4.1. The intelligence improvements extend beyond coding benchmarks.
Is Sonnet 4.5 priced differently from Sonnet 4?
Current pricing is shown on this page. AI Gateway routes across providers, and rates may vary by provider.