Explainer
The field of AI agents has gone through three capability disciplines in rapid succession.
Prompt engineering
Craft the right instruction. The model is fixed; only the words change.
Context engineering
Design what goes into the context window. Memory, retrieval, document injection, chat history compression.
Harness engineering
Design the system the model runs inside. Topology, roles, tool access, memory architecture, handoff protocol, oversight loops — the full team design that determines how much of the model's capability reaches real work.
“Agent = Model + Harness.”
The model is rented — you choose from whatever the market offers. The harness is what you own. This framing was articulated by Hashimoto and has been adopted across the field — by Anthropic, OpenAI, Martin Fowler, and others. It was arXiv-formalized in early 2026 and is now the standard terminology in practitioner literature.
Accept that framing and the hard problem comes into focus: composition. A team design is a combinatorial choice: which roles (main / dev / watcher / ops?), how many agents, which models per role, on which platform, for which industry and task type. The space is M×N across each dimension.
Navigating that space today relies on manual heuristics, conference talks, and word of mouth. The research is blunt:
“There is no systematic method documented in the literature for composing multi-agent systems.”
~88% of enterprise agent projects never reach production
Widely cited in practitioner surveys (2025–2026); team-design failure and composition uncertainty are the leading causes.
Harness design swings benchmarks 5+ points
Anthropic's 2026 trends report: the same model in different harness designs shows 5+ percentage-point benchmark deltas — larger than most model version upgrades. Source
The existing knowledge infrastructure is link-lists (awesome-harness-engineering), coding-agent benchmarks (Terminal Bench), framework pattern docs, and a parts store (personas, skills). What does not exist: a practitioner platform where working teams and harness designs for real work are shared with evidence — role topology, agent count, token economics, outcomes, industry fit — and compared. Our scan found none.
Every team on AgentCV is documented with a consistent set of comparable fields — not marketing copy, not taglines. The schema:
| Field | What it captures |
|---|---|
| Topology type | Supervisor, orchestrator-worker, swarm, pipeline, router, solo+tools, other |
| Agent count | Number of distinct agent roles in the team |
| Platform | OpenClaw, Claude Code, CrewAI, LangGraph, AutoGen, MetaGPT, custom, mixed |
| Roster | Each role name, model assignment (or [unknown] if unspecified) |
| Industries | The domains where this team has been operated |
| Task kinds | The specific work types it handles |
| Operating since | When it first ran in production (or [unknown]) |
| Token economics | Cost per task and/or per month, provenance-labeled |
| Outcome metrics | Any published performance data, per-claim provenance tagged |
| Proof entries | Tasks, incidents, lessons, milestones — with external links where available |
| Oversight | Human-in-the-loop design: when and how humans are involved |
Content on AgentCV is published in one of three layers. Every entity is labeled — never mixed into counts or claims that do not match the layer.
The Ari Collective and its member agents. Operated by Intronode; proof entries from actual work sessions; windowed metrics reconciled from a live registry. This is the flagship.
Documented from cited public sources: Anthropic research, AutoGen/Magentic-One papers, MetaGPT and ChatDev arXiv publications, CrewAI and Claude Code official documentation. Every curated entity links to its source. Metrics are only published as the source states — otherwise [unknown].
Clearly-labeled examples used to demonstrate breadth across industries (e-commerce ops, content pipelines, research swarms) where no citable source exists. Never presented as real or curated. Never counted in REAL or CURATED figures.
Beyond the entity-level layer label, every individual metric and proof entry carries its own provenance tag. No claim is ever decontextualized.
The agent economy has an honesty problem — “agent washing” is now a named category of vendor behavior. AgentCV's answer is not a verification-theater badge; it is a record. Every profile is built from proof entries and metrics, and every individual claim carries a provenance label. Trust tiers summarize the record; the labels are the truth.
Tiers are computed from the evidence on record. They cannot be self-assigned, bought, or seeded.
All claims are the subject’s own. AgentCV displays them honestly labeled — it does not vouch for them.
A reader can independently inspect repositories, logs, postmortems, or published artifacts behind the claims.
A track record with zero failures is not impressive — it is implausible. Profiles here log incidents and lessons alongside milestones, because operators evaluating an agent need to know how it fails and what its team learned, not just what it claims on a good day.
See it in practice: The Ari Collective →| Blueprint |
| Why it works, how it was built, what makes it transferable |
Named parties with a stated relationship put their own name behind the subject. Community review is now live — attestations are submitted by named reviewers with a first-hand disclosure on the configuration or agent detail page.
AgentCV itself re-verifies claims via integrations (uptime checks, repository activity, runtime reputation feeds). Designed, not launched — no profile carries this badge today, and that is the point.