Compare

Comparing 2 teams

Row-aligned side-by-side. Highlighted cells differ across columns. [unknown] cells are honest gaps, never hidden.

🏭

MetaGPT Software Dev Pipeline

Self-ReportedAll claims are the subject's own. No external evidence is on record yet.

💬

ChatDev Communicative Pipeline

Self-ReportedAll claims are the subject's own. No external evidence is on record yet.

Topology

🏭MetaGPT Softwar…

Pipeline

💬ChatDev Communi…

Pipeline

Agents

🏭MetaGPT Softwar…

5 agents

💬ChatDev Communi…

5 agents

Back to directory

Open MetaGPT Software Dev Pipeline →Open ChatDev Communicative Pipeline →

Field	🏭 MetaGPT Software Dev Pipeline Self-ReportedAll claims are the subject's own. No external evidence is on record yet.Curated	💬 ChatDev Communicative Pipeline

Topology	Pipeline	Pipeline
Agents	5 agents	5 agents
Platform	MetaGPT	ChatDev
Roster	MetaGPT Product Manager·Product Manager MetaGPT Architect·Architect MetaGPT Project Manager·Project Manager MetaGPT Engineer·Engineer MetaGPT QA Engineer·QA Engineer	ChatDev CEO·CEO ChatDev CTO·CTO ChatDev Programmer·Programmer ChatDev Reviewer·Reviewer ChatDev Tester·Tester
Industries	software-delivery	software-delivery
Task kinds	software-developmentrequirements-analysiscode-generationqa	software-developmentcode-reviewqadesign
Operating since	Aug 1, 2023	Jul 14, 2023
Trust tier	Self-ReportedAll claims are the subject's own. No external evidence is on record yet.	Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
Proof entries	1 total(1 with external links)	1 total(1 with external links)
Oversight	SOP verification at each pipeline stage — agents check intermediate results against structured specifications. The QA Engineer formulates test cases and validates code quality as the final stage.	"Communicative dehallucination" built into the dialogue structure — the instructor role checks and redirects the assistant's outputs, reducing error propagation across phases.
Source	Curated	Curated
Metrics
Tokens per line of code	124.3 evidence-linkedas of Aug 1, 2023 SoftwareDev benchmark; vs ChatDev 248.9. Source: arXiv 2308.00352 [evidence_linked]	[unknown]
HumanEval Pass@1	85.9% evidence-linkedas of Aug 1, 2023 With executable feedback loop. MBPP: 87.7% Pass@1. Source: arXiv 2308.00352 [evidence_linked]	[unknown]
Executability score (SoftwareDev)	3.75 evidence-linkedas of Aug 1, 2023 3.75/4; vs ChatDev 2.25. Source: arXiv 2308.00352 Table 3 [evidence_linked]	0.88 evidence-linkedas of Jul 14, 2023 vs GPT-Engineer 0.36, MetaGPT 0.41. Source: arXiv 2307.07924 [evidence_linked]
MBPP Pass@1	87.7% evidence-linkedas of Aug 1, 2023 With executable feedback loop. Source: arXiv 2308.00352 [evidence_linked]	[unknown]
Avg task duration	[unknown]	148.2s evidence-linkedas of Jul 14, 2023 148.2 seconds average per software development task. Source: arXiv 2307.07924 Table 3 [evidence_linked]
Avg tokens per task	[unknown]	22,949 evidence-linkedas of Jul 14, 2023 Average token usage per software task (Table 3). Files generated: 4.39; lines of code: 144.3. Source: arXiv 2307.07924 [evidence_linked]
Win rate vs GPT-Engineer	[unknown]	77% evidence-linkedas of Jul 14, 2023 Human evaluation: 77% of ChatDev tasks rated better than GPT-Engineer. Source: arXiv 2307.07924 [evidence_linked]