Rethinking AI

Why Current AI Tests (Still) Fail

Even many "generalization" benchmarks for AI miss the point. They often test parametric novelty, not structural novelty.

Parametric Novelty: "Here's a new level of the same game." (e.g., procedurally generated levels in Atari's Procgen). The rules, objects, and physics are identical. The AI doesn't need to induce a new world model.

Structural Novelty: Here's a new game with new rules you've never seen." This is what humans are great at. It requires building a new world model from scratch, or adapting an old one.

We are not testing for world model induction. We're just testing how well a static model generalizes to slight variations.

Prev Next

14 / 20