Everyone's posting browser-agent demos this week. Click here, scroll there, fill that form. Most break by click seven.
Mine broke too. The submit button on a checkout form that the frontier vision model literally couldn't see. Billed at $0.01-0.05 per call, called 20-50 times per agent run, the model was burning reasoning capacity on parsing pixel coordinates. A 2B specialist I trained for $4 hits that same button 2.5x more reliably on ScreenSpot-v2.
The architecture is the bug, not the model.
Full Write-Up: https://renezander.com/blog/browser-agent-grounding-split/