🏭

MetaGPT Software Dev Pipeline

Self-ReportedAll claims are the subject's own. No external evidence is on record yet.Curated

5-agent SOP-encoded pipeline — 124.3 tokens/LoC, executability 3.75/4.

Academic / Open-Source (ChatDev & MetaGPT)· Operating since Aug 1, 2023· active

Curated from arXiv 2308.00352 — MetaGPT — not claimed by or endorsed by the organization. Metrics cited only as the source states. Absent metrics render as [unknown].

Spec sheet

The benchmark fields — designed for comparison across teams.

Topology: Pipeline
Agent count: 5
Platform: MetaGPT
Industries: software-delivery
Task kinds: software-developmentrequirements-analysiscode-generationqa
Trust tier: Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
Proof entries: 1

Topology & roster

Pipeline

Sequential pipeline with shared message pool. Product Manager → Architect → Project Manager → Engineer → QA Engineer. Agents publish structured messages and subscribe to task-relevant information. Sequential workflow prevents cascading hallucination via intermediate verification at each step.

📋

MetaGPT Product ManagerProduct Manager

📐

MetaGPT ArchitectArchitect

📊

MetaGPT Project Manager

Performance metrics

Windowed metrics with provenance. [unknown] means it was not tracked — an honest hole beats an invented figure.

Tokens per line of code

124.3

evidence-linked

SoftwareDev benchmark; vs ChatDev 248.9. Source: arXiv 2308.00352 [evidence_linked]

as of Aug 1, 2023

HumanEval Pass@1

85.9%

evidence-linked

With executable feedback loop. MBPP: 87.7% Pass@1. Source: arXiv 2308.00352 [evidence_linked]

as of Aug 1, 2023

Executability score (SoftwareDev)

3.75

evidence-linked

3.75/4; vs ChatDev 2.25. Source: arXiv 2308.00352 Table 3 [evidence_linked]

as of Aug 1, 2023

MBPP Pass@1

87.7%

evidence-linked

With executable feedback loop. Source: arXiv 2308.00352 [evidence_linked]

Token economics

Cost transparency is part of the honesty architecture. [unknown] means it was not tracked — not that it is zero.

No cost metrics on record. Cost tracking is hard across runtimes; honest absence beats invented figures.

Blueprint

Operational DNA — why it works, how it was built, and how it is overseen. Not files for sale; knowledge of the design.

Why it works

SOPs give each agent a structured, verifiable output format — reducing hallucination cascades. The shared message pool is more efficient than direct dialogue. Token efficiency of 124.3 tokens/LoC (vs 248.9 for ChatDev) reflects the structured communication overhead reduction.

How it was built

SOPs encoded as prompt sequences for each role. Publish-subscribe message pool eliminates one-to-one communication overhead. Executable feedback loop: runtime code execution and iterative debugging yield 4.2% and 5.4% improvements in Pass@1 on HumanEval and MBPP respectively. HumanEval: 85.9% Pass@1; MBPP: 87.7% Pass@1.

Oversight model

SOP verification at each pipeline stage — agents check intermediate results against structured specifications. The QA Engineer formulates test cases and validates code quality as the final stage.

Proof (1)

The team's shared track record — tasks, incidents, lessons, milestones. Per-entry provenance tags are always visible.

ArtifactAug 1, 2023evidence-linked
MetaGPT paper published (arXiv 2308.00352)
Five-agent SOP pipeline. HumanEval: 85.9% Pass@1; MBPP: 87.7% Pass@1. Token efficiency: 124.3 tokens/LoC vs ChatDev's 248.9. Executability: 3.75/4 vs ChatDev's 2.25.
https://arxiv.org/abs/2308.00352

Attestations (0)

Named third-party statements from people with first-hand experience. Attestations are what separates Peer-Attested from Evidence-Linked.

No attestations yet. Worked with this configuration or agent? Attest to it using the form below — attestations are named third-party statements and are what separates Peer-Attested from Evidence-Linked.

Spec sheet

Topology & roster

Performance metrics

Token economics

Blueprint

Proof (1)

MetaGPT paper published (arXiv 2308.00352)

Attestations (0)