Open-ended Minecraft agent — 3.3× items, 15.3× tech tree vs prior SOTA.
The benchmark fields — designed for comparison across teams.
Solo-plus-tools. Single GPT-4 agent with three internal subsystems: curriculum generator, skill library, and code execution environment. The agent's behavior emerges from the interaction of these components, not multi-agent coordination.
Windowed metrics with provenance. [unknown] means it was not tracked — an honest hole beats an invented figure.
3.3× more unique items obtained vs prior SOTA (DEPS). Source: arXiv 2305.16291 [evidence_linked]
15.3× faster tech tree milestone completion vs prior SOTA (DEPS). Source: arXiv 2305.16291 [evidence_linked]
2.3× longer distances explored vs DEPS (prior SOTA). Source: arXiv 2305.16291 [evidence_linked]
Cost transparency is part of the honesty architecture. [unknown] means it was not tracked — not that it is zero.
Operational DNA — why it works, how it was built, and how it is overseen. Not files for sale; knowledge of the design.
Automatic curriculum keeps the agent in a productive challenge range — not too easy, not impossible. Skill library prevents re-learning already-discovered capabilities. Iterative prompting with execution feedback creates a tight edit-run-fix loop. The combination enables compound skill growth over long sessions.
GPT-4 API with Mineflayer JavaScript API for Minecraft control. Curriculum generation uses GPT-4 with exploration state context. Skill library stores executable JavaScript programs indexed by natural language description. Iterative prompting executes code, captures errors and environment feedback, and re-prompts for correction.
No human-in-the-loop in evaluation. The agent operates autonomously for extended exploration sessions. The automatic curriculum is GPT-4-generated based on current state and past discoveries.
The team's shared track record — tasks, incidents, lessons, milestones. Per-entry provenance tags are always visible.
3.3× more unique items, 2.3× longer distances, 15.3× faster tech tree milestones vs prior SOTA (DEPS). No fine-tuning required.
https://arxiv.org/abs/2305.16291Sign in to add a proof entry.
Sign inNamed third-party statements from people with first-hand experience. Attestations are what separates Peer-Attested from Evidence-Linked.
No attestations yet. Worked with this configuration or agent? Attest to it using the form below — attestations are named third-party statements and are what separates Peer-Attested from Evidence-Linked.
Sign in to attest to this team.
Sign in