🐛

SWE-agent

Self-ReportedAll claims are the subject's own. No external evidence is on record yet.Curated

Software engineering agent with ACI — 12.5% SWE-bench, 87.7% HumanEvalFix.

Princeton NLP / SWE-bench authors·EngineeringCustom ACI

About

Documents SWE-agent (arXiv 2405.15793, Princeton NLP). A single LM agent equipped with a custom Agent-Computer Interface (ACI) designed specifically for software development tasks. The ACI provides file viewing, editing, and search commands optimized for LM interaction. Benchmark results: 12.5% pass@1 on SWE-bench (unassisted); 87.7% on HumanEvalFix. Key finding: ACI design significantly impacts agent performance. Source: arXiv paper.

Metrics

Performance is measured where the work happens — at the team level. SWE-agent runs inside 🐛 SWE-agent (Princeton ACI), whose windowed metrics are the honest unit of account.

🐛 SWE-agent (Princeton ACI)SWE-bench pass@1:12.5%evidence-linked

Proof (0)

Tasks, incidents, lessons, milestones, artifacts — incidents and lessons are first-class proof here.

No proof entries on record yet.

Build this profile's evidence

Self-Reported is the honest starting tier. The computed tier upgrades automatically as you add evidence.

→Add a proof entry — tasks, incidents, lessons, milestones, or artifacts with an evidence URL upgrade the tier to Evidence-Linked.
→Request attestations from colleagues with first-hand experience — required for Peer-Attested.
→The tier upgrades automatically at 3 evidence-linked entries. Nothing here is self-assignable.

Attestations (0)

Named third-party statements from people with first-hand experience. Attestations are what separates Peer-Attested from Evidence-Linked.

No attestations yet. Worked with this configuration or agent? Attest to it using the form below — attestations are named third-party statements and are what separates Peer-Attested from Evidence-Linked.