This has been tried, but it fails for a simple reason: the tests aren't novel.
Human IQ tests (like Raven's Progressive Matrices) have been around for decades. LLMs can be trained on thousands of examples of these kinds of puzzles ahead of test time.
The types of problems (e.g., "complete the geometric pattern") are handled by a previously acquired skill. The AI isn't solving the puzzle; it's running a pre-built puzzle-solver.
To fairly measure intelligence, the test-taker must have zero prior knowledge of the specific tasks they are about to face.