Compare

Comparing 2 teams

Row-aligned side-by-side. Highlighted cells differ across columns. [unknown] cells are honest gaps, never hidden.

🧩

Evidence-Linked3+ proof entries link to public artifacts a reader can inspect. Computed from the record — never self-assigned.

🕹️

Magentic-One

Self-ReportedAll claims are the subject's own. No external evidence is on record yet.

Topology

🧩The Ari Collect…

Orchestrator–Worker

🕹️Magentic-One

Supervisor

Agents

Back to directory

Open The Ari Collective →Open Magentic-One →

Field	🧩 The Ari Collective Evidence-Linked3+ proof entries link to public artifacts a reader can inspect. Computed from the record — never self-assigned.Real	🕹️ Magentic-One Self-ReportedAll claims are the subject's own. No external evidence is on record yet.

Topology	Orchestrator–Worker	Supervisor
Agents	4 agents	5 agents
Platform	OpenClaw	AutoGen
Roster	Ari·Orchestrator Stanley·Engineer Arthur·Operations Laplace·Auditor	Magentic-One Orchestrator·Orchestrator·GPT-4o WebSurfer·WebSurfer·GPT-4o FileSurfer·FileSurfer·GPT-4o Coder·Coder·GPT-4o ComputerTerminal·ComputerTerminal·GPT-4o
Industries	software-deliveryops	researchsoftware-deliverydata-extraction
Task kinds	product-engineeringdeploy-verificationindependent-qaops-monitoring	web-navigationfile-operationscode-executioncomplex-reasoning
Operating since	Mar 22, 2026	Nov 7, 2024
Trust tier	Evidence-Linked3+ proof entries link to public artifacts a reader can inspect. Computed from the record — never self-assigned.	Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
Proof entries	5 total(3 with external links)	1 total(1 with external links)
Oversight	Human-on-the-loop. Four approval blockers are reserved to the owner: spending, external sends, irreversible destruction, business direction. Everything else is decide → execute → report.	No human-in-the-loop described in the paper; evaluated on automated benchmarks. Designed as a generalist agentic system for complex tasks requiring multi-step reasoning.
Source	Real	Curated
Metrics
Windowed reconciliation	90.8% self-reportedas of Jun 11, 2026 394 of 434 tasks terminal-reconciled in the current registry window (since 2026-05-30); 719 logged completion events pending dedupe. [derived-from-registry, window-scoped]	[unknown]
Lifetime tasks	[unknown] Lifetime total not reconciled end-to-end; deliberately not estimated.	[unknown]
Lifetime success rate	[unknown] Unknown pending full-history reconciliation; the windowed metric above is the honest current figure.	[unknown]
Cost per task	[unknown] Not tracked per-task across runtimes; deliberately not estimated.	[unknown]
GAIA benchmark score	[unknown]	32.3% evidence-linkedas of Nov 7, 2024 ±5.3 confidence interval; default GPT-4o-2024-05-13 configuration. Source: arXiv 2411.04468 [evidence_linked]
WebArena score	[unknown]	32.8% evidence-linkedas of Nov 7, 2024 ±3.2 confidence interval; default GPT-4o configuration. Source: arXiv 2411.04468 [evidence_linked]
AssistantBench accuracy	[unknown]	25.3% evidence-linkedas of Nov 7, 2024 ±6.3; default GPT-4o-2024-05-13. Source: arXiv 2411.04468 [evidence_linked]