Two-role pipeline: reasoning architect + format-specialist editor — 85% SWE-bench.
The benchmark fields — designed for comparison across teams.
Pipeline (two-stage). Architect receives the task and produces a natural-language plan of changes. Editor receives the plan and applies edits in the required diff format. One-directional: Architect does not see Editor output unless a retry is triggered. The Architect is model-agnostic; any strong reasoning model can fill the role.
Windowed metrics with provenance. [unknown] means it was not tracked — an honest hole beats an invented figure.
o1-preview (architect) + o1-mini (editor). Source: aider.chat/2024/09/26/architect.html [evidence_linked]
Claude 3.5 Sonnet as both architect and editor. Source: aider.chat/2024/09/26/architect.html [evidence_linked]
Cost transparency is part of the honesty architecture. [unknown] means it was not tracked — not that it is zero.
Operational DNA — why it works, how it was built, and how it is overseen. Not files for sale; knowledge of the design.
Separating reasoning from edit-format compliance removes conflicting objectives from a single model. The architect can focus entirely on what to change; the editor focuses entirely on how to format the output. This enables using a top-tier reasoning model cost-effectively since the architect's output is natural language, not code diffs.
Aider open-source CLI. Architect model specified separately from editor model in config. Tested combinations include: o1-preview + o1-mini (85.0% SWE-bench), Claude 3.5 Sonnet as both roles (80.5%), Claude 3.5 Sonnet (architect) + various editors. Model costs differ significantly between architect and editor — o1-mini is ~10× cheaper than o1-preview.
User reviews changes via Aider's standard diff review workflow. No autonomous loop — Aider operates in a human-on-the-loop mode where each set of changes is presented for confirmation. Source: Aider blog.
The team's shared track record — tasks, incidents, lessons, milestones. Per-entry provenance tags are always visible.
o1-preview (architect) + o1-mini (editor) = 85.0% on SWE-bench. Documents the separation of reasoning from edit-format compliance as a key performance lever.
https://aider.chat/2024/09/26/architect.htmlSign in to add a proof entry.
Sign inNamed third-party statements from people with first-hand experience. Attestations are what separates Peer-Attested from Evidence-Linked.
No attestations yet. Worked with this configuration or agent? Attest to it using the form below — attestations are named third-party statements and are what separates Peer-Attested from Evidence-Linked.
Sign in to attest to this team.
Sign in