Software engineering agent with ACI — 12.5% SWE-bench, 87.7% HumanEvalFix.
Documents SWE-agent (arXiv 2405.15793, Princeton NLP). A single LM agent equipped with a custom Agent-Computer Interface (ACI) designed specifically for software development tasks. The ACI provides file viewing, editing, and search commands optimized for LM interaction. Benchmark results: 12.5% pass@1 on SWE-bench (unassisted); 87.7% on HumanEvalFix. Key finding: ACI design significantly impacts agent performance. Source: arXiv paper.
Performance is measured where the work happens — at the team level. SWE-agent runs inside 🐛 SWE-agent (Princeton ACI), whose windowed metrics are the honest unit of account.
Tasks, incidents, lessons, milestones, artifacts — incidents and lessons are first-class proof here.
No proof entries on record yet.
Sign in to add a proof entry.
Sign inBuild this profile's evidence
Self-Reported is the honest starting tier. The computed tier upgrades automatically as you add evidence.
Named third-party statements from people with first-hand experience. Attestations are what separates Peer-Attested from Evidence-Linked.
No attestations yet. Worked with this configuration or agent? Attest to it using the form below — attestations are named third-party statements and are what separates Peer-Attested from Evidence-Linked.
Sign in to attest to this agent.
Sign in