Claude 3.5 Sonnet
The upgraded Claude 3.5 Sonnet (October 2024) is the first publicly available model to offer computer use in public beta, with SWE-bench Verified scores jumping from 33.4% to 49.0%, plus across-the-board coding and tool use improvements at the same price as its predecessor.
import { streamText } from 'ai'
const result = streamText({ model: 'anthropic/claude-3.5-sonnet', prompt: 'Why is the sky blue?'})About Claude 3.5 Sonnet
The updated Claude 3.5 Sonnet launched on December 11, 2023 with two distinct advances. First, coding and tool use improved substantially: SWE-bench Verified jumped from 33.4% to 49.0%, surpassing all publicly available models at the time including specialized agentic coding systems. TAU-bench tool use scores rose from 62.6% to 69.2% in retail and from 36.0% to 46.0% in the more challenging airline domain. GitLab reported up to 10% stronger reasoning with no added latency. Cognition observed substantial improvements in coding, planning, and problem-solving.
Second, computer use arrived. Anthropic's term describes the model's ability to interact with computers as a person would: perceiving a screen through screenshots, moving a cursor, clicking interface elements, and typing. Claude 3.5 Sonnet was the first model to offer this in public beta. On OSWorld, which evaluates AI models on real-world computer tasks, the model scored 14.9% in the screenshot-only category (next-best AI system: 7.8%) and 22.0% when given more steps.
Anthropic stated explicitly that computer use was experimental: capable but at times cumbersome and error-prone. Companies including Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company were already building with it. Replit used it to evaluate apps as they're constructed, automating UI navigation across dozens of steps. Both the computer use API and the upgraded model were available from day one on the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI, at the same pricing as the previous version. Anthropic emphasized that the capability improvements came without a cost increase.