Google is launching a preview of its Gemini 2.5 Computer Use model, codenamed Project Mariner, enabling AI agents to interact with graphical user interfaces, specifically browsers and websites. This specialized model analyzes user requests, screenshots, and action history to perform tasks like clicking, typing, searching, and scrolling. It loops through analysis and execution until a task is complete. Demonstrations show impressive performance in web and mobile UI control, outperforming competitors in browser control quality and latency. Developers can access this AI via Gemini API in Google AI Studio and Vertex AI.
Prepared by Jonathan Pierce and reviewed by editorial team.
Comments