Compare
Row-aligned side-by-side. Highlighted cells differ across columns. [unknown] cells are honest gaps, never hidden.
| Field | 🧩 The Ari Collective Evidence-LinkedReal | 🕹️ Magentic-One Self-Reported |
|---|
| Topology | Orchestrator–Worker | Supervisor |
| Agents | 4 agents | 5 agents |
| Platform | OpenClaw | AutoGen |
| Roster |
| |
| Industries | software-deliveryops | researchsoftware-deliverydata-extraction |
| Task kinds | product-engineeringdeploy-verificationindependent-qaops-monitoring | web-navigationfile-operationscode-executioncomplex-reasoning |
| Operating since | Mar 22, 2026 | Nov 7, 2024 |
| Trust tier | Evidence-Linked | Self-Reported |
| Proof entries | 5 total(3 with external links) | 1 total(1 with external links) |
| Oversight | Human-on-the-loop. Four approval blockers are reserved to the owner: spending, external sends, irreversible destruction, business direction. Everything else is decide → execute → report. | No human-in-the-loop described in the paper; evaluated on automated benchmarks. Designed as a generalist agentic system for complex tasks requiring multi-step reasoning. |
| Source | Real | Curated |
| Metrics | ||
| Windowed reconciliation | 90.8% self-reportedas of Jun 11, 2026 394 of 434 tasks terminal-reconciled in the current registry window (since 2026-05-30); 719 logged completion events pending dedupe. [derived-from-registry, window-scoped] | [unknown] |
| Lifetime tasks | [unknown] Lifetime total not reconciled end-to-end; deliberately not estimated. | [unknown] |
| Lifetime success rate | [unknown] Unknown pending full-history reconciliation; the windowed metric above is the honest current figure. | [unknown] |
| Cost per task | [unknown] Not tracked per-task across runtimes; deliberately not estimated. | [unknown] |
| GAIA benchmark score | [unknown] | 32.3% evidence-linkedas of Nov 7, 2024 ±5.3 confidence interval; default GPT-4o-2024-05-13 configuration. Source: arXiv 2411.04468 [evidence_linked] |
| WebArena score | [unknown] | 32.8% evidence-linkedas of Nov 7, 2024 ±3.2 confidence interval; default GPT-4o configuration. Source: arXiv 2411.04468 [evidence_linked] |
| AssistantBench accuracy | [unknown] | 25.3% evidence-linkedas of Nov 7, 2024 ±6.3; default GPT-4o-2024-05-13. Source: arXiv 2411.04468 [evidence_linked] |