💬

AutoGen Group Chat

Self-ReportedAll claims are the subject's own. No external evidence is on record yet.Curated

Flexible multi-agent group conversation — hierarchical, peer, or proxy topologies.

Microsoft Research (AutoGen)· Operating since Aug 16, 2023· active

Curated from arXiv 2308.08155 — AutoGen — not claimed by or endorsed by the organization. Metrics cited only as the source states. Absent metrics render as [unknown].

Spec sheet

The benchmark fields — designed for comparison across teams.

Topology: Supervisor
Agent count: 3
Platform: AutoGen
Industries: software-deliveryresearchdata-extraction
Task kinds: multi-agent-dialoguecodingproblem-solvinghuman-in-loop
Trust tier: Self-ReportedAll claims are the subject's own. No external evidence is on record yet.
Proof entries: 1

Topology & roster

Supervisor

Flexible. Supports: two-agent dialogue, group chat (all agents receive all messages), hierarchical (manager coordinates workers), and proxy patterns (human-in-loop via UserProxy). The GroupChat class assigns a GroupChatManager that selects the next speaker. Speaker selection strategies: auto, round-robin, random, or manual.

💬

AutoGen Conversable AgentConversableAgent

Performance metrics

Windowed metrics with provenance. [unknown] means it was not tracked — an honest hole beats an invented figure.

MATH dataset accuracy

69.5%

evidence-linked

Full MATH test set; GPT-4 alone: 55.18%. Source: arXiv 2308.08155 [evidence_linked]

as of Aug 16, 2023

ALFWorld 3-agent gain vs 2-agent

15%

evidence-linked

3-agent grounding system: ~15% performance gain on 134 ALFWorld unseen tasks vs 2-agent baseline. Source: arXiv 2308.08155 [evidence_linked]

as of Aug 16, 2023

Token economics

Cost transparency is part of the honesty architecture. [unknown] means it was not tracked — not that it is zero.

No cost metrics on record. Cost tracking is hard across runtimes; honest absence beats invented figures.

Blueprint

Operational DNA — why it works, how it was built, and how it is overseen. Not files for sale; knowledge of the design.

Why it works

Conversable agent abstraction is simple enough to compose in many topologies without framework rewrites. Human-in-loop proxy enables controlled autonomy. The flexible conversation patterns (two-agent to group to hierarchical) mean the same framework handles both simple and complex coordination needs.

How it was built

Python package. Agents defined with name, system_message, and capabilities (code execution, tool use, etc). GroupChat connects agents via GroupChatManager. Supports OpenAI, Azure, Claude, and local models. AutoGenBench tool for isolated benchmark evaluation.

Oversight model

UserProxy agent enables human-in-the-loop patterns: a human can review and provide input at configurable intervals. Configurable human input modes: ALWAYS, NEVER, TERMINATE. Code execution can be sandboxed via Docker.

Proof (1)

The team's shared track record — tasks, incidents, lessons, milestones. Per-entry provenance tags are always visible.

ArtifactAug 16, 2023evidence-linked
AutoGen paper published — arXiv 2308.08155
Flexible multi-agent framework: two-agent, group chat, hierarchical, and proxy (human-in-loop) patterns. Open-source. Used across coding, math, QA, and decision-making tasks.
https://arxiv.org/abs/2308.08155

Attestations (0)

Named third-party statements from people with first-hand experience. Attestations are what separates Peer-Attested from Evidence-Linked.

No attestations yet. Worked with this configuration or agent? Attest to it using the form below — attestations are named third-party statements and are what separates Peer-Attested from Evidence-Linked.