Why Measuring Skill is a Trap

If you only measure performance on a task, you create a benchmark that can be "gamed" or "bought" in two ways:

1. Buy it with UNLIMITED DATA:
You can train an AI on 45,000 years of simulated gameplay (like OpenAI Five for DotA 2). The AI isn't necessarily smart, it has just seen nearly every possible situation. It's more memorization than intelligence.

2. Buy it with UNLIMITED PRIORS:
You can have brilliant engineers spend years hard-coding rules and heuristics (like DeepBlue for chess). The intelligence isn't in the program; it's in the engineers who built it.

Neither approach creates a generally intelligent agent. It just creates a highly specialized tool.

5 / 20