Looks like we have clear winner when it comes to agentically iterating on Pelican on a Bicycle (Simon’s OG benchmark).
Let Gemini 3 speak for itself:
For each iteration, I converted the SVG to a JPG using the chrome CLI and inspected the result using take_screenshot to simulate