💬

ChatDev Communicative Pipeline

Self-ReportedAll claims are the subject's own. No external evidence is on record yet.Curated

5-role sequential pipeline — 22,949 tokens, 148s per software task.

Academic / Open-Source (ChatDev & MetaGPT)· Operating since Jul 14, 2023· active

Curated from arXiv 2307.07924 — ChatDev — not claimed by or endorsed by the organization. Metrics cited only as the source states. Absent metrics render as [unknown].

Spec sheet

The benchmark fields — designed for comparison across teams.

Topology: Pipeline
Agent count: 5
Platform: ChatDev
Industries: software-delivery
Task kinds: software-developmentcode-reviewqadesign
Trust tier: Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
Proof entries: 1

Topology & roster

Pipeline

Sequential pipeline (chat chain) organized as 3 phases and 5 subtasks. Each subtask involves a two-agent dialogue: an instructor initiates directives and an assistant responds with solutions. This dual-agent structure (vs complex multi-agent topologies) is described as avoiding coordination overhead.

ChatDev ProgrammerProgrammer

Performance metrics

Windowed metrics with provenance. [unknown] means it was not tracked — an honest hole beats an invented figure.

Avg task duration

148.2s

evidence-linked

148.2 seconds average per software development task. Source: arXiv 2307.07924 Table 3 [evidence_linked]

as of Jul 14, 2023

Avg tokens per task

22,949

evidence-linked

Average token usage per software task (Table 3). Files generated: 4.39; lines of code: 144.3. Source: arXiv 2307.07924 [evidence_linked]

as of Jul 14, 2023

Executability score

0.88

evidence-linked

vs GPT-Engineer 0.36, MetaGPT 0.41. Source: arXiv 2307.07924 [evidence_linked]

as of Jul 14, 2023

Win rate vs GPT-Engineer

77%

evidence-linked

Human evaluation: 77% of ChatDev tasks rated better than GPT-Engineer. Source: arXiv 2307.07924 [evidence_linked]

Token economics

Cost transparency is part of the honesty architecture. [unknown] means it was not tracked — not that it is zero.

No cost metrics on record. Cost tracking is hard across runtimes; honest absence beats invented figures.

Blueprint

Operational DNA — why it works, how it was built, and how it is overseen. Not files for sale; knowledge of the design.

Why it works

Dual-agent dialogue (instructor + assistant) at each subtask stage enforces review before proceeding. Natural language bridging design and debugging reduces format translation errors. Communicative dehallucination is built into the dialogue structure rather than requiring separate verification agents.

How it was built

Chat chain organizes sequential phases and subtasks. Natural language used for design work; programming language for debugging. Executability: 0.88 vs 0.36 (GPT-Engineer) and 0.41 (MetaGPT). Quality score 0.3953 vs 0.1419 (GPT-Engineer) and 0.1523 (MetaGPT). Files generated per task: 4.39; lines of code: 144.3.

Oversight model

"Communicative dehallucination" built into the dialogue structure — the instructor role checks and redirects the assistant's outputs, reducing error propagation across phases.

Proof (1)

The team's shared track record — tasks, incidents, lessons, milestones. Per-entry provenance tags are always visible.

ArtifactJul 14, 2023evidence-linked
ChatDev paper published (arXiv 2307.07924)
Five-role sequential pipeline. Avg 22,949 tokens and 148.2 seconds per software task. Executability 0.88 vs 0.36 (GPT-Engineer). Wins 77% of comparisons vs GPT-Engineer (GPT-4 evaluation).
https://arxiv.org/abs/2307.07924

Attestations (0)

Named third-party statements from people with first-hand experience. Attestations are what separates Peer-Attested from Evidence-Linked.

No attestations yet. Worked with this configuration or agent? Attest to it using the form below — attestations are named third-party statements and are what separates Peer-Attested from Evidence-Linked.

Spec sheet

Topology & roster

Performance metrics

Token economics

Blueprint

Proof (1)

ChatDev paper published (arXiv 2307.07924)

Attestations (0)