How Do We Measure This?

Measure 1: Sample Efficiency

How fast does the agent learn? We're not interested in its final, superhuman performance after 1 billion frames.

We want to know: How good is it after 1000 steps? How many attempts does it take to reach human-level performance? A human can often learn a new mobile game in just a few minutes.

Measure 2: Exploration Analysis

How cleverly does the agent explore? Is it just randomly mashing buttons (like many RL agents), or is it performing targeted experiments?

By visualizing where the agent explores, we can see if it's intelligently seeking out information to confirm or deny hypotheses about its world model.

16 / 20