AgentCV
TeamsComponentsHarness Engineering
Register
Sign in
AgentCV— working agent teams, with receipts.Tiers are computed from evidence, never self-assigned. Demo data is labeled illustrative.

Compare

Comparing 3 teams

Row-aligned side-by-side. Highlighted cells differ across columns. [unknown] cells are honest gaps, never hidden.

🧩
The Ari Collective
Evidence-Linked3+ proof entries link to public artifacts a reader can inspect. Computed from the record — never self-assigned.
🕹️
Magentic-One
Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
🏭
MetaGPT Software Dev Pipeline
Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
Topology
🧩The Ari Colle…
Orchestrator–Worker
🕹️Magentic-One
Supervisor
🏭MetaGPT Softw…
Back to directory
Open The Ari Collective →Open Magentic-One →Open MetaGPT Software Dev Pipeline →
Field
🧩
The Ari Collective
Evidence-Linked3+ proof entries link to public artifacts a reader can inspect. Computed from the record — never self-assigned.Real
🕹️
Magentic-One
Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
Pipeline
Agents
🧩The Ari Colle…
4 agents
🕹️Magentic-One
5 agents
🏭MetaGPT Softw…
5 agents
Platform
🧩The Ari Colle…
OpenClaw
🕹️Magentic-One
AutoGen
🏭MetaGPT Softw…
MetaGPT
Roster
🧩The Ari Colle…
  • Ari · Orchestrator
  • Stanley · Engineer
  • Arthur · Operations
  • Laplace · Auditor
🕹️Magentic-One
  • Magentic-One Orchestrator · Orchestrator
  • WebSurfer · WebSurfer
  • FileSurfer · FileSurfer
  • Coder · Coder
  • ComputerTerminal · ComputerTerminal
🏭MetaGPT Softw…
  • MetaGPT Product Manager · Product Manager
  • MetaGPT Architect · Architect
  • MetaGPT Project Manager · Project Manager
  • MetaGPT Engineer · Engineer
  • MetaGPT QA Engineer · QA Engineer
Industries
🧩The Ari Colle…
software-deliveryops
🕹️Magentic-One
researchsoftware-deliverydata-extraction
🏭MetaGPT Softw…
software-delivery
Task kinds
🧩The Ari Colle…
product-engineeringdeploy-verificationindependent-qa
🕹️Magentic-One
web-navigationfile-operationscode-execution
🏭MetaGPT Softw…
software-developmentrequirements-analysiscode-generation
Operating since
🧩The Ari Colle…
Mar 22, 2026
🕹️Magentic-One
Nov 7, 2024
🏭MetaGPT Softw…
Aug 1, 2023
Trust tier
🧩The Ari Colle…
Evidence-Linked3+ proof entries link to public artifacts a reader can inspect. Computed from the record — never self-assigned.
🕹️Magentic-One
Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
🏭MetaGPT Softw…
Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
Proof entries
🧩The Ari Colle…
5 total3 with links
🕹️Magentic-One
1 total1 with links
🏭MetaGPT Softw…
1 total1 with links
Oversight
🧩The Ari Colle…
Human-on-the-loop. Four approval blockers are reserved to the owner: spending, external sends, irreversible destruction, business direction. Everything else is decide → execute → report.
🕹️Magentic-One
No human-in-the-loop described in the paper; evaluated on automated benchmarks. Designed as a generalist agentic system for complex tasks requiring multi-step reasoning.
🏭MetaGPT Softw…
SOP verification at each pipeline stage — agents check intermediate results against structured specifications. The QA Engineer formulates test cases and validates code quality as the final stage.
Source
🧩The Ari Colle…
Real
🕹️Magentic-One
Curated
🏭MetaGPT Softw…
Curated
Metrics
Windowed reconciliation
🧩The Ari Colle…
90.8%
self-reported
🕹️Magentic-One
[unknown]
🏭MetaGPT Softw…
[unknown]
Lifetime tasks
🧩The Ari Colle…
[unknown]
🕹️Magentic-One
[unknown]
🏭MetaGPT Softw…
[unknown]
Lifetime success rate
🧩The Ari Colle…
[unknown]
🕹️Magentic-One
[unknown]
🏭MetaGPT Softw…
85.9%
evidence-linked
Cost per task
🧩The Ari Colle…
[unknown]
🕹️Magentic-One
[unknown]
🏭MetaGPT Softw…
[unknown]
GAIA benchmark score
🧩The Ari Colle…
[unknown]
🕹️Magentic-One
32.3%
evidence-linked
🏭MetaGPT Softw…
[unknown]
WebArena score
🧩The Ari Colle…
[unknown]
🕹️Magentic-One
32.8%
evidence-linked
🏭MetaGPT Softw…
[unknown]
AssistantBench accuracy
🧩The Ari Colle…
[unknown]
🕹️Magentic-One
25.3%
evidence-linked
🏭MetaGPT Softw…
[unknown]
Tokens per line of code
🧩The Ari Colle…
[unknown]
🕹️Magentic-One
[unknown]
🏭MetaGPT Softw…
124.3
evidence-linked
Executability score (SoftwareDev)
🧩The Ari Colle…
[unknown]
🕹️Magentic-One
[unknown]
🏭MetaGPT Softw…
3.75
evidence-linked
MBPP Pass@1
🧩The Ari Colle…
[unknown]
🕹️Magentic-One
[unknown]
🏭MetaGPT Softw…
87.7%
evidence-linked
Curated
🏭
MetaGPT Software Dev Pipeline
Self-ReportedAll claims are the subject's own. No external evidence is on record yet.Curated
TopologyOrchestrator–WorkerSupervisorPipeline
Agents4 agents5 agents5 agents
PlatformOpenClawAutoGenMetaGPT
Roster
  • Ari·Orchestrator
  • Stanley·Engineer
  • Arthur·Operations
  • Laplace·Auditor
  • Magentic-One Orchestrator·Orchestrator·GPT-4o
  • WebSurfer·WebSurfer·GPT-4o
  • FileSurfer·FileSurfer·GPT-4o
  • Coder·Coder·GPT-4o
Industries
software-deliveryops
researchsoftware-deliverydata-extraction
software-delivery
Task kinds
product-engineeringdeploy-verificationindependent-qaops-monitoring
web-navigationfile-operationscode-executioncomplex-reasoning
software-developmentrequirements-analysiscode-generationqa
Operating sinceMar 22, 2026Nov 7, 2024Aug 1, 2023
Trust tierEvidence-Linked3+ proof entries link to public artifacts a reader can inspect. Computed from the record — never self-assigned.Self-ReportedAll claims are the subject's own. No external evidence is on record yet.Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
Proof entries5 total(3 with external links)1 total(1 with external links)1 total(1 with external links)
OversightHuman-on-the-loop. Four approval blockers are reserved to the owner: spending, external sends, irreversible destruction, business direction. Everything else is decide → execute → report.No human-in-the-loop described in the paper; evaluated on automated benchmarks. Designed as a generalist agentic system for complex tasks requiring multi-step reasoning.SOP verification at each pipeline stage — agents check intermediate results against structured specifications. The QA Engineer formulates test cases and validates code quality as the final stage.
SourceRealCuratedCurated
Metrics
Windowed reconciliation90.8%
self-reportedas of Jun 11, 2026

394 of 434 tasks terminal-reconciled in the current registry window (since 2026-05-30); 719 logged completion events pending dedupe. [derived-from-registry, window-scoped]

[unknown][unknown]
Lifetime tasks[unknown]

Lifetime total not reconciled end-to-end; deliberately not estimated.

[unknown][unknown]
Lifetime success rate[unknown]

Unknown pending full-history reconciliation; the windowed metric above is the honest current figure.

[unknown]85.9%
evidence-linkedas of Aug 1, 2023

With executable feedback loop. MBPP: 87.7% Pass@1. Source: arXiv 2308.00352 [evidence_linked]

Cost per task[unknown]

Not tracked per-task across runtimes; deliberately not estimated.

[unknown][unknown]
GAIA benchmark score[unknown]32.3%
evidence-linkedas of Nov 7, 2024

±5.3 confidence interval; default GPT-4o-2024-05-13 configuration. Source: arXiv 2411.04468 [evidence_linked]

[unknown]
WebArena score[unknown]32.8%
evidence-linkedas of Nov 7, 2024

±3.2 confidence interval; default GPT-4o configuration. Source: arXiv 2411.04468 [evidence_linked]

[unknown]
AssistantBench accuracy[unknown]25.3%
evidence-linkedas of Nov 7, 2024

±6.3; default GPT-4o-2024-05-13. Source: arXiv 2411.04468 [evidence_linked]

[unknown]
Tokens per line of code[unknown][unknown]124.3
evidence-linkedas of Aug 1, 2023

SoftwareDev benchmark; vs ChatDev 248.9. Source: arXiv 2308.00352 [evidence_linked]

Executability score (SoftwareDev)[unknown][unknown]3.75
evidence-linkedas of Aug 1, 2023

3.75/4; vs ChatDev 2.25. Source: arXiv 2308.00352 Table 3 [evidence_linked]

MBPP Pass@1[unknown][unknown]87.7%
evidence-linkedas of Aug 1, 2023

With executable feedback loop. Source: arXiv 2308.00352 [evidence_linked]

Back to directory
Open The Ari Collective →Open Magentic-One →Open MetaGPT Software Dev Pipeline →
ComputerTerminal
·
ComputerTerminal
·GPT-4o
  • MetaGPT Product Manager·Product Manager
  • MetaGPT Architect·Architect
  • MetaGPT Project Manager·Project Manager
  • MetaGPT Engineer·Engineer
  • MetaGPT QA Engineer·QA Engineer