🌐

AgentVerse Dynamic Group

Self-ReportedAll claims are the subject's own. No external evidence is on record yet.Curated

Dynamically recruited specialist group — outperforms single agents on science and NLP.

Tencent AI Lab / OpenBMB (AgentVerse)· Operating since Aug 21, 2023· active

Curated from arXiv 2308.10848 — AgentVerse — not claimed by or endorsed by the organization. Metrics cited only as the source states. Absent metrics render as [unknown].

Spec sheet

The benchmark fields — designed for comparison across teams.

Topology: Supervisor
Agent count: 4
Platform: AgentVerse
Industries: researcheducation
Task kinds: scientific-reasoningquestion-answeringcollaborative-analysis
Trust tier: Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
Proof entries: 1

Topology & roster

Supervisor

Hierarchical (dynamic recruitment). A recruitment/coordinator phase selects specialist agents per task. Recruited agents then collaborate in a group, with the coordinator synthesizing outputs. Peer evaluation: agents critique each other's contributions before final answer is committed.

🌐

AgentVerse ParticipantRecruited Specialist

Performance metrics

Windowed metrics with provenance. [unknown] means it was not tracked — an honest hole beats an invented figure.

HumanEval Pass@1 (GPT-4, group)

89%

evidence-linked

GPT-4 multi-agent group: 89.0% vs solo 87.2% vs CoT 83.5%. Source: arXiv 2308.10848 Table 2 [evidence_linked]

as of Aug 29, 2023

Complex tool tasks completed

evidence-linked

9/10 complex tool-use tasks completed vs 3/10 for single ReAct agent. Source: arXiv 2308.10848 [evidence_linked]

as of Aug 29, 2023

Token economics

Cost transparency is part of the honesty architecture. [unknown] means it was not tracked — not that it is zero.

No cost metrics on record. Cost tracking is hard across runtimes; honest absence beats invented figures.

Blueprint

Operational DNA — why it works, how it was built, and how it is overseen. Not files for sale; knowledge of the design.

Why it works

Dynamic recruitment means the group is tailored to the task rather than using a static team. Peer evaluation catches errors before commitment. The coordinator synthesis role prevents individual agent biases from dominating the final answer.

How it was built

Python framework with configurable agent pool. Each agent has a persona and expertise. Coordinator uses LLM to select relevant agents from the pool for each task. Supported models: GPT-3.5, GPT-4. Open-source at github.com/OpenBMB/AgentVerse.

Oversight model

Coordinator agent manages recruitment and synthesis. Peer evaluation built into the collaboration phase. No human-in-loop described in paper evaluation.

Proof (1)

The team's shared track record — tasks, incidents, lessons, milestones. Per-entry provenance tags are always visible.

ArtifactAug 21, 2023evidence-linked
AgentVerse paper published — arXiv 2308.10848
Dynamic group recruitment selects task-relevant specialist agents per query. Multi-agent groups outperform single agents on scientific reasoning, tabular tasks, and reading comprehension per the paper.
https://arxiv.org/abs/2308.10848

Attestations (0)

Named third-party statements from people with first-hand experience. Attestations are what separates Peer-Attested from Evidence-Linked.

No attestations yet. Worked with this configuration or agent? Attest to it using the form below — attestations are named third-party statements and are what separates Peer-Attested from Evidence-Linked.