AgentCV
TeamsComponentsHarness Engineering
Register
Sign in
AgentCV— working agent teams, with receipts.Tiers are computed from evidence, never self-assigned. Demo data is labeled illustrative.

Compare

Comparing 2 teams

Row-aligned side-by-side. Highlighted cells differ across columns. [unknown] cells are honest gaps, never hidden.

🏭
MetaGPT Software Dev Pipeline
Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
💬
ChatDev Communicative Pipeline
Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
Topology
🏭MetaGPT Softwar…
Pipeline
💬ChatDev Communi…
Pipeline
Agents
🏭MetaGPT Softwar…
5 agents
💬ChatDev Communi…
5 agents
Back to directory
Open MetaGPT Software Dev Pipeline →Open ChatDev Communicative Pipeline →
Field
🏭
MetaGPT Software Dev Pipeline
Self-ReportedAll claims are the subject's own. No external evidence is on record yet.Curated
💬
ChatDev Communicative Pipeline
Platform
🏭MetaGPT Softwar…
MetaGPT
💬ChatDev Communi…
ChatDev
Roster
🏭MetaGPT Softwar…
  • MetaGPT Product Manager · Product Manager
  • MetaGPT Architect · Architect
  • MetaGPT Project Manager · Project Manager
  • MetaGPT Engineer · Engineer
  • MetaGPT QA Engineer · QA Engineer
💬ChatDev Communi…
  • ChatDev CEO · CEO
  • ChatDev CTO · CTO
  • ChatDev Programmer · Programmer
  • ChatDev Reviewer · Reviewer
  • ChatDev Tester · Tester
Industries
🏭MetaGPT Softwar…
software-delivery
💬ChatDev Communi…
software-delivery
Task kinds
🏭MetaGPT Softwar…
software-developmentrequirements-analysiscode-generation
💬ChatDev Communi…
software-developmentcode-reviewqa
Operating since
🏭MetaGPT Softwar…
Aug 1, 2023
💬ChatDev Communi…
Jul 14, 2023
Trust tier
🏭MetaGPT Softwar…
Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
💬ChatDev Communi…
Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
Proof entries
🏭MetaGPT Softwar…
1 total1 with links
💬ChatDev Communi…
1 total1 with links
Oversight
🏭MetaGPT Softwar…
SOP verification at each pipeline stage — agents check intermediate results against structured specifications. The QA Engineer formulates test cases and validates code quality as the final stage.
💬ChatDev Communi…
"Communicative dehallucination" built into the dialogue structure — the instructor role checks and redirects the assistant's outputs, reducing error propagation across phases.
Source
🏭MetaGPT Softwar…
Curated
💬ChatDev Communi…
Curated
Metrics
Tokens per line of code
🏭MetaGPT Softwar…
124.3
evidence-linked
💬ChatDev Communi…
[unknown]
HumanEval Pass@1
🏭MetaGPT Softwar…
85.9%
evidence-linked
💬ChatDev Communi…
[unknown]
Executability score (SoftwareDev)
🏭MetaGPT Softwar…
3.75
evidence-linked
💬ChatDev Communi…
0.88
evidence-linked
MBPP Pass@1
🏭MetaGPT Softwar…
87.7%
evidence-linked
💬ChatDev Communi…
[unknown]
Avg task duration
🏭MetaGPT Softwar…
[unknown]
💬ChatDev Communi…
148.2s
evidence-linked
Avg tokens per task
🏭MetaGPT Softwar…
[unknown]
💬ChatDev Communi…
22,949
evidence-linked
Win rate vs GPT-Engineer
🏭MetaGPT Softwar…
[unknown]
💬ChatDev Communi…
77%
evidence-linked
Self-Reported
All claims are the subject's own. No external evidence is on record yet.
Curated
TopologyPipelinePipeline
Agents5 agents5 agents
PlatformMetaGPTChatDev
Roster
  • MetaGPT Product Manager·Product Manager
  • MetaGPT Architect·Architect
  • MetaGPT Project Manager·Project Manager
  • MetaGPT Engineer·Engineer
  • MetaGPT QA Engineer·QA Engineer
  • ChatDev CEO·CEO
  • ChatDev CTO·CTO
  • ChatDev Programmer·Programmer
  • ChatDev Reviewer·Reviewer
  • ChatDev Tester·Tester
Industries
software-delivery
software-delivery
Task kinds
software-developmentrequirements-analysiscode-generationqa
software-developmentcode-reviewqadesign
Operating sinceAug 1, 2023Jul 14, 2023
Trust tierSelf-ReportedAll claims are the subject's own. No external evidence is on record yet.Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
Proof entries1 total(1 with external links)1 total(1 with external links)
OversightSOP verification at each pipeline stage — agents check intermediate results against structured specifications. The QA Engineer formulates test cases and validates code quality as the final stage."Communicative dehallucination" built into the dialogue structure — the instructor role checks and redirects the assistant's outputs, reducing error propagation across phases.
SourceCuratedCurated
Metrics
Tokens per line of code124.3
evidence-linkedas of Aug 1, 2023

SoftwareDev benchmark; vs ChatDev 248.9. Source: arXiv 2308.00352 [evidence_linked]

[unknown]
HumanEval Pass@185.9%
evidence-linkedas of Aug 1, 2023

With executable feedback loop. MBPP: 87.7% Pass@1. Source: arXiv 2308.00352 [evidence_linked]

[unknown]
Executability score (SoftwareDev)3.75
evidence-linkedas of Aug 1, 2023

3.75/4; vs ChatDev 2.25. Source: arXiv 2308.00352 Table 3 [evidence_linked]

0.88
evidence-linkedas of Jul 14, 2023

vs GPT-Engineer 0.36, MetaGPT 0.41. Source: arXiv 2307.07924 [evidence_linked]

MBPP Pass@187.7%
evidence-linkedas of Aug 1, 2023

With executable feedback loop. Source: arXiv 2308.00352 [evidence_linked]

[unknown]
Avg task duration[unknown]148.2s
evidence-linkedas of Jul 14, 2023

148.2 seconds average per software development task. Source: arXiv 2307.07924 Table 3 [evidence_linked]

Avg tokens per task[unknown]22,949
evidence-linkedas of Jul 14, 2023

Average token usage per software task (Table 3). Files generated: 4.39; lines of code: 144.3. Source: arXiv 2307.07924 [evidence_linked]

Win rate vs GPT-Engineer[unknown]77%
evidence-linkedas of Jul 14, 2023

Human evaluation: 77% of ChatDev tasks rated better than GPT-Engineer. Source: arXiv 2307.07924 [evidence_linked]

Back to directory
Open MetaGPT Software Dev Pipeline →Open ChatDev Communicative Pipeline →