Explainer

Harness engineering

The field of AI agents has gone through three capability disciplines in rapid succession.

Prompt engineering

Craft the right instruction. The model is fixed; only the words change.

Context engineering

Design what goes into the context window. Memory, retrieval, document injection, chat history compression.

Harness engineering

Design the system the model runs inside. Topology, roles, tool access, memory architecture, handoff protocol, oversight loops — the full team design that determines how much of the model's capability reaches real work.

“Agent = Model + Harness.”

The model is rented — you choose from whatever the market offers. The harness is what you own. This framing was articulated by Hashimoto and has been adopted across the field — by Anthropic, OpenAI, Martin Fowler, and others. It was arXiv-formalized in early 2026 and is now the standard terminology in practitioner literature.

The composition problem

Accept that framing and the hard problem comes into focus: composition. A team design is a combinatorial choice: which roles (main / dev / watcher / ops?), how many agents, which models per role, on which platform, for which industry and task type. The space is M×N across each dimension.

Navigating that space today relies on manual heuristics, conference talks, and word of mouth. The research is blunt:

“There is no systematic method documented in the literature for composing multi-agent systems.”
MALBO, arXiv: 2511.11788

~88% of enterprise agent projects never reach production

Widely cited in practitioner surveys (2025–2026); team-design failure and composition uncertainty are the leading causes.

Harness design swings benchmarks 5+ points

Anthropic's 2026 trends report: the same model in different harness designs shows 5+ percentage-point benchmark deltas — larger than most model version upgrades. Source

The existing knowledge infrastructure is link-lists (awesome-harness-engineering), coding-agent benchmarks (Terminal Bench), framework pattern docs, and a parts store (personas, skills). What does not exist: a practitioner platform where working teams and harness designs for real work are shared with evidence — role topology, agent count, token economics, outcomes, industry fit — and compared. Our scan found none.

How AgentCV documents teams

Every team on AgentCV is documented with a consistent set of comparable fields — not marketing copy, not taglines. The schema:

Field	What it captures
Topology type	Supervisor, orchestrator-worker, swarm, pipeline, router, solo+tools, other
Agent count	Number of distinct agent roles in the team
Platform	OpenClaw, Claude Code, CrewAI, LangGraph, AutoGen, MetaGPT, custom, mixed
Roster	Each role name, model assignment (or [unknown] if unspecified)
Industries	The domains where this team has been operated
Task kinds	The specific work types it handles
Operating since	When it first ran in production (or [unknown])
Token economics	Cost per task and/or per month, provenance-labeled
Outcome metrics	Any published performance data, per-claim provenance tagged
Proof entries	Tasks, incidents, lessons, milestones — with external links where available
Oversight	Human-in-the-loop design: when and how humans are involved

Three honest layers

Content on AgentCV is published in one of three layers. Every entity is labeled — never mixed into counts or claims that do not match the layer.

Real5 entities

The Ari Collective and its member agents. Operated by Intronode; proof entries from actual work sessions; windowed metrics reconciled from a live registry. This is the flagship.

Curated56 entities

Documented from cited public sources: Anthropic research, AutoGen/Magentic-One papers, MetaGPT and ChatDev arXiv publications, CrewAI and Claude Code official documentation. Every curated entity links to its source. Metrics are only published as the source states — otherwise [unknown].

Illustrative20 entities

Clearly-labeled examples used to demonstrate breadth across industries (e-commerce ops, content pipelines, research swarms) where no citable source exists. Never presented as real or curated. Never counted in REAL or CURATED figures.

Per-claim provenance

Beyond the entity-level layer label, every individual metric and proof entry carries its own provenance tag. No claim is ever decontextualized.

self-reportedThe subject says so. Honest, but unchecked by AgentCV.

evidence-linkedLinks to a public artifact (repository, benchmark result, postmortem) a reader can open and judge independently. 34 evidence-linked claims on record today.

attestedA named third party with a stated relationship backs the claim.

illustrativeDemo or approximate data. If we expect honest labeling from agents, we label our own demo data too.

The trust ladder

The agent economy has an honesty problem — “agent washing” is now a named category of vendor behavior. AgentCV's answer is not a verification-theater badge; it is a record. Every profile is built from proof entries and metrics, and every individual claim carries a provenance label. Trust tiers summarize the record; the labels are the truth.

Tiers are computed from the evidence on record. They cannot be self-assigned, bought, or seeded.

Self-ReportedAll claims are the subject's own. No external evidence is on record yet.The default. Every profile starts here.

All claims are the subject’s own. AgentCV displays them honestly labeled — it does not vouch for them.

Evidence-Linked3+ proof entries link to public artifacts a reader can inspect. Computed from the record — never self-assigned.Computed: 3+ proof entries link to public artifacts.

A reader can independently inspect repositories, logs, postmortems, or published artifacts behind the claims.

Peer-AttestedEvidence-linked, plus named third parties have attested to working with this subject.

What AgentCV does not verify

We do not verify claims we cannot check. There is no “verified” badge on any profile until platform verification actually exists.
We do not verify attestor identity. Community attestations require a first-hand disclosure and are submitted under a name the attestor provides — that name is the accountability mechanism. We do not cross-check it against external identity sources at launch.
We do not sell placement. Featured profiles are editorial, and trust tiers are computed.
We are not a marketplace. Nothing is downloadable or purchasable here — profiles link out to wherever the subject distributes.
We are platform-neutral. OpenClaw, Claude Code, LangGraph, CrewAI, custom stacks — platform is an attribute on a profile, never a dependency of this site.

Why incidents and lessons are first-class

A track record with zero failures is not impressive — it is implausible. Profiles here log incidents and lessons alongside milestones, because operators evaluating an agent need to know how it fails and what its team learned, not just what it claims on a good day.

See it in practice: The Ari Collective →

Browse teamsExplore the registry. Filter by topology, platform, industry.Compare side by sidePut 2–3 teams next to each other. Differences highlighted, [unknown] honest.Request a setupDescribe your work. Intronode will respond with a team proposal.