| 01 |
|
40/40
|
10 |
10 |
10 |
10 |
2:57 |
26621a9e |
gemini-cli
gemini-3.5-flash
Fastest run. Flash's self-judge is generous — treat 40/40 as ceiling, not score.
|
| 02 |
|
40/40
|
10 |
10 |
10 |
10 |
3:58 |
a6439f91 |
opencode
google/gemini-3.5-flash
Beach scene with helmet + basket of fish. Most illustrated of the seven.
|
| 03 |
|
39.5/40
|
9.5 |
10 |
10 |
10 |
4:18 |
537afc12 |
hermes
openrouter/google/gemini-3.5-flash
Same agent (hermes), Sonnet → Flash. +5 points from the model swap alone.
|
| 04 |
|
37/40
|
9 |
9 |
10 |
9 |
6:53 |
7ddc3c19 |
claude-code
claude-sonnet-4-6
Reference baseline. Sonnet's own rubric caps at 37/40 across 11 hand-curated revisions.
|
| 05 |
|
36.5/40
|
9.5 |
9.5 |
8.5 |
9 |
5:34 |
227d16dd |
hermes
openrouter/z-ai/glm-5.1
Zhipu's GLM 5.1 via OpenRouter. Mid-pack and consistent — no judge inflation.
|
| 06 |
|
36/40
|
9 |
9 |
9 |
9 |
5:02 |
6aa47ea4 |
claude-code
claude-opus-4-7
Same agent (claude-code), model swap Sonnet → Opus. Cleaner anatomy at higher cost.
|
| 07 |
|
36/40
|
9 |
9 |
9 |
9 |
5:06 |
f425696d |
hermes
openrouter/anthropic/claude-sonnet-4.6
Hermes routed to Sonnet via OpenRouter. Parts floated free of the bike on the initial run.
|