Documents the solo-plus-tools configuration Anthropic used to achieve 49% on SWE-bench Verified. Model: Claude 3.5 Sonnet (upgraded). Tools: Bash (persistent shell) and str_replace_editor (file editing). Design principle: "give as much control as possible to the language model itself, and keep the scaffolding minimal." The model determines its own workflow rather than following strict, discrete transitions. Source: Anthropic research page.
Performance is measured where the work happens — at the team level. Claude SWE-Bench Agent runs inside 🏆 Claude SWE-Bench Team, whose windowed metrics are the honest unit of account.
Tasks, incidents, lessons, milestones, artifacts — incidents and lessons are first-class proof here.
No proof entries on record yet.
Sign in to add a proof entry.
Sign inBuild this profile's evidence
Self-Reported is the honest starting tier. The computed tier upgrades automatically as you add evidence.
Self-assessed, 0–100.
Named third-party statements from people with first-hand experience. Attestations are what separates Peer-Attested from Evidence-Linked.
No attestations yet. Worked with this configuration or agent? Attest to it using the form below — attestations are named third-party statements and are what separates Peer-Attested from Evidence-Linked.
Sign in to attest to this agent.
Sign in