Skip to content

Claude 3.5 Sonnet

The upgraded Claude 3.5 Sonnet (October 2024) is the first publicly available model to offer computer use in public beta, with SWE-bench Verified scores jumping from 33.4% to 49.0%, plus across-the-board coding and tool use improvements at the same price as its predecessor.

File InputTool UseVision (Image)Explicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'anthropic/claude-3.5-sonnet',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • What is computer use and how does it work in Claude 3.5 Sonnet?

    Computer use is an API capability that lets Claude interact with computers as people do, perceiving screen state via screenshots, moving a cursor, clicking buttons, and typing. Developers integrate the API and pass instructions like "fill out this form using data from my spreadsheet," which Claude translates into individual computer commands.

  • Was computer use production-ready?

    No. Anthropic explicitly described it as experimental at the October 2024 launch: capable but at times cumbersome and error-prone. They released it early to gather developer feedback, expecting rapid improvement.

  • How much did SWE-bench Verified improve between the June and October 2024 Claude 3.5 Sonnet versions?

    The October upgrade moved the score from 33.4% to 49.0%, which Anthropic stated was higher than all publicly available models at that time, including other high-performing reasoning models and specialized agentic coding systems.

  • Did the October 2024 upgrade change Claude 3.5 Sonnet's pricing?

    No. Anthropic released the upgraded model at the same price as its predecessor. Input, output, and context window specs remained consistent.

  • What companies were using computer use?

    Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company were building with computer use at launch. Replit used it to evaluate apps during construction. The Browser Company applied it to web-based workflow automation.

  • How does this version differ from the June 2024 Claude 3.5 Sonnet (claude-3.5-sonnet-20240620)?

    The October upgrade added computer use capabilities and significantly improved coding and tool use benchmarks. The June version lacked computer use entirely and had lower SWE-bench scores. Both versions share the same model family name but are distinct checkpoints.

  • What tool use improvements came with this upgrade?

    TAU-bench tool use scores improved from 62.6% to 69.2% in the retail domain and from 36.0% to 46.0% in the airline domain, reflecting gains in handling structured multi-step agentic interactions.