Open-source AI software developer with sandboxed runtime — ICLR 2025.
The benchmark fields — designed for comparison across teams.
Solo-plus-tools. Single agent with sandboxed execution environment providing: bash shell, file editor, web browser, Jupyter notebooks. Agent selects and sequences actions autonomously. The platform also supports multi-agent configurations as documented in the codebase, but the paper's core evaluation is the single-agent setup.
Windowed metrics with provenance. [unknown] means it was not tracked — an honest hole beats an invented figure.
CodeActAgent v1.8 with claude-3-5-sonnet@20240620 on SWE-bench Lite (300 instances). Source: arXiv 2407.16741 Table 1 [evidence_linked]
CodeActAgent v1.5, 0-shot, GPT-4o-2024-05-13. Source: arXiv 2407.16741 [evidence_linked]
Cost transparency is part of the honesty architecture. [unknown] means it was not tracked — not that it is zero.
Operational DNA — why it works, how it was built, and how it is overseen. Not files for sale; knowledge of the design.
Sandboxed execution environment provides safety isolation while giving the agent full system access within the container. Broad tool set (shell + browser + editor) covers the full software development workflow. Open-source with large contributor base (188+) drives rapid iteration.
Docker-containerized sandbox with persistent state. Web UI for interaction. REST API for programmatic control. 188+ contributors. Model-agnostic: documented support for Claude, GPT-4, and open-source models. Open-source at github.com/All-Hands-AI/OpenHands.
Sandbox isolation by default. Human can review and intervene at any step. Platform supports both autonomous mode and interactive mode. Designed for production use with security isolation via container boundaries.
The team's shared track record — tasks, incidents, lessons, milestones. Per-entry provenance tags are always visible.
Open-source AI software developer platform with 188+ contributors. Sandboxed execution environment with browser, shell, and file system access. Evaluated on SWE-bench and WebArena.
https://arxiv.org/abs/2407.16741Sign in to add a proof entry.
Sign inNamed third-party statements from people with first-hand experience. Attestations are what separates Peer-Attested from Evidence-Linked.
No attestations yet. Worked with this configuration or agent? Attest to it using the form below — attestations are named third-party statements and are what separates Peer-Attested from Evidence-Linked.
Sign in to attest to this team.
Sign in