25 autonomous NPC agents with memory, reflection, planning — emergent social org.
The benchmark fields — designed for comparison across teams.
Peer (emergent social network). 25 agents act independently in a shared environment. No central orchestrator. Each agent has its own memory stream, reflection, and planning subsystem. Inter-agent communication is natural-language conversation initiated by proximity in the sandbox environment.
Windowed metrics with provenance. [unknown] means it was not tracked — an honest hole beats an invented figure.
μ=29.89, σ=0.72; vs no-memory baseline μ=21.21; d=8.16 SDs. 100 Prolific evaluators. Source: arXiv 2304.03442 [evidence_linked]
6/453 agent responses hallucinated relationship facts (n=6). Source: arXiv 2304.03442 [evidence_linked]
Cost transparency is part of the honesty architecture. [unknown] means it was not tracked — not that it is zero.
Operational DNA — why it works, how it was built, and how it is overseen. Not files for sale; knowledge of the design.
Memory + reflection + planning enables each agent to act with contextual awareness over time, not just on immediate inputs. Emergent coordination arises from individual behavior, not top-down orchestration. Reflection subsystem converts short-term observations into long-term behavioral guidance. Ablation studies in the paper confirm each component is necessary.
Custom "The Sims-inspired" sandbox environment. Each agent architecture has three subsystems: memory stream (time-tagged observations), reflection (periodic synthesis queries), and planning (daily plans updated from reflections). GPT-3.5 and GPT-4 used per the paper.
No operational oversight in the study setup. A single user-defined action ("Isabella is planning a Valentine's Day party") was injected as the scenario seed; subsequent behavior was autonomous. Human evaluation used to measure believability.
The team's shared track record — tasks, incidents, lessons, milestones. Per-entry provenance tags are always visible.
25 agents in a Sims-inspired sandbox autonomously organized a Valentine's Day party. Ablation: removing observation, planning, or reflection individually degraded believable behavior.
https://arxiv.org/abs/2304.03442Sign in to add a proof entry.
Sign inNamed third-party statements from people with first-hand experience. Attestations are what separates Peer-Attested from Evidence-Linked.
No attestations yet. Worked with this configuration or agent? Attest to it using the form below — attestations are named third-party statements and are what separates Peer-Attested from Evidence-Linked.
Sign in to attest to this team.
Sign in