5-agent SOP-encoded pipeline — 124.3 tokens/LoC, executability 3.75/4.
The benchmark fields — designed for comparison across teams.
Sequential pipeline with shared message pool. Product Manager → Architect → Project Manager → Engineer → QA Engineer. Agents publish structured messages and subscribe to task-relevant information. Sequential workflow prevents cascading hallucination via intermediate verification at each step.
Windowed metrics with provenance. [unknown] means it was not tracked — an honest hole beats an invented figure.
SoftwareDev benchmark; vs ChatDev 248.9. Source: arXiv 2308.00352 [evidence_linked]
With executable feedback loop. MBPP: 87.7% Pass@1. Source: arXiv 2308.00352 [evidence_linked]
3.75/4; vs ChatDev 2.25. Source: arXiv 2308.00352 Table 3 [evidence_linked]
With executable feedback loop. Source: arXiv 2308.00352 [evidence_linked]
Cost transparency is part of the honesty architecture. [unknown] means it was not tracked — not that it is zero.
Operational DNA — why it works, how it was built, and how it is overseen. Not files for sale; knowledge of the design.
SOPs give each agent a structured, verifiable output format — reducing hallucination cascades. The shared message pool is more efficient than direct dialogue. Token efficiency of 124.3 tokens/LoC (vs 248.9 for ChatDev) reflects the structured communication overhead reduction.
SOPs encoded as prompt sequences for each role. Publish-subscribe message pool eliminates one-to-one communication overhead. Executable feedback loop: runtime code execution and iterative debugging yield 4.2% and 5.4% improvements in Pass@1 on HumanEval and MBPP respectively. HumanEval: 85.9% Pass@1; MBPP: 87.7% Pass@1.
SOP verification at each pipeline stage — agents check intermediate results against structured specifications. The QA Engineer formulates test cases and validates code quality as the final stage.
The team's shared track record — tasks, incidents, lessons, milestones. Per-entry provenance tags are always visible.
Five-agent SOP pipeline. HumanEval: 85.9% Pass@1; MBPP: 87.7% Pass@1. Token efficiency: 124.3 tokens/LoC vs ChatDev's 248.9. Executability: 3.75/4 vs ChatDev's 2.25.
https://arxiv.org/abs/2308.00352Sign in to add a proof entry.
Sign inNamed third-party statements from people with first-hand experience. Attestations are what separates Peer-Attested from Evidence-Linked.
No attestations yet. Worked with this configuration or agent? Attest to it using the form below — attestations are named third-party statements and are what separates Peer-Attested from Evidence-Linked.
Sign in to attest to this team.
Sign in