Claude 3.7 Sonnet is here—here's what we know
Claude 3.7 Sonnet is an LLM that combines both general-purpose and reasoning tasks into a single model to take on the likes of o3, Grok 3, and DeepSeek-R1.
Anthropic has launched Claude 3.7 Sonnet, a new language model that can perform both general and reasoning tasks. The new model comes on the heels of large reasoning models (LRMs) released by OpenAI (o3), Google (Gemini 2.0 Flash Thinking), xAI (Grok 3 Reasoning), and DeepSeek (R1).
What is Claude 3.7 Sonnet?
While other AI labs separate their general-purpose LLMs from their LRMs, Anthropic has decided to ship both into a single model to provide a “seamless experience for users.” Other labs have acknowledged the frustration of having to choose between models, and are aiming to create a seamless experience.
However, it seems that Claude 3.7 Sonnet has merged both capabilities into a single model. When using the model, users can turn on the “extended thinking mode,” which causes Claude 3.7 to generate a chain-of-thought (CoT) sequence before giving the final answer, which is equivalent to what LRMs such as o1 and R1 do. According to Anthropic’s blog, “Extended thinking mode isn’t an option that switches to a different model with a separate strategy. Instead, it’s allowing the very same model to give itself more time, and expend more effort, in coming to an answer.”
The company has not shared any details on how it has trained a single model to perform both forms of inference (though there are studies that show it is possible to train models to choose between different inference modes based on the input prompt).
According to Anthropic, Claude will reveal its thought process “in raw form.” This is in contrast to Grok 3 and o3, which show a summarized version of the CoT to prevent competitors from copying their models. (In the blog post, Anthropic has hinted that they might hide the CoT in future versions: “We’ll weigh the pros and cons of revealing the thought process for future releases. In the meantime, the visible thought process in Claude 3.7 Sonnet should be considered a research preview.”)
According to Anthropic’s own experiments, Claude 3.7 outperforms other frontier models in software engineering (SWE-Bench Verified) and user and tool interaction (TAU-Bench), while also scoring near state-of-the-art on other key benchmarks such as MATH, AIME 2024, and MMLU.
Where can you access Claude 3.7 Sonnet?
Like other models from Anthropic, Claude 3.7 Sonnet is not open source and is only available through Anthropic’s servers. You can access it via the Claude chatbot application, where it is available to all tiers, including free users (though free users won’t be able to use the extended thinking mode and only have access to general-purpose features.)
Claude 3.7 Sonnet is also available through Anthropic API, Amazon Bedrock, and Google Vertex AI. Claude 3.7 Sonnet costs $3 per million input tokens and $15 per million output tokens. It is much cheaper than OpenAI o1 but is almost four times as expensive as o3-mini (you can get a huge discount if your application allows you to use prompt caching).
The API allows developers to specify a limit for the reasoning tokens to prevent the model from draining their budget.