
Know Gemini 2.5 Computer Use Model
Google DeepMind has unveiled the Gemini 2.5 Computer Use model, an advanced AI built on the Gemini 2.5 Pro foundation. It is designed to help developers build agents that can actually use computers—just like humans do.
The model is now available in preview through the Gemini API in Google AI Studio and Vertex AI. It combines deep visual understanding with reasoning skills to navigate web and mobile interfaces with remarkable accuracy and low latency.
What Makes It Different
Until now, most AI systems relied on APIs to perform actions. But many digital tasks, such as filling out forms or managing dashboards, still depend on interacting with graphical user interfaces.
The Gemini 2.5 Computer Use model changes that by enabling agents to click, type, scroll, and manipulate dropdown menus directly on a screen. It can even operate behind logins—something most models can’t handle efficiently.
This advancement could simplify workflows where automation meets real-world interaction, making AI agents more capable of performing everyday digital tasks.
How the Model Works
At the core of this innovation lies the new computer_use
tool in the Gemini API. Here’s how it functions:
- The model receives a user request along with a screenshot of the interface and recent actions.
- It analyzes the image, decides the next step, and generates an action—like clicking or typing.
- Once the action executes, the system captures a new screenshot and sends it back to the model.
- The process continues until the task completes or the user stops it.
This looping mechanism mimics how humans interact with computers—learning, adjusting, and completing goals iteratively.
Strong on Performance and Speed
Performance benchmarks show that Gemini 2.5 Computer Use leads on major tests such as Online-Mind2Web, WebVoyager, and AndroidWorld.
It not only achieves higher accuracy than competitors but also delivers faster results. In tests by Browserbase, it maintained over 70% accuracy with about 225 seconds latency, outperforming rival AI systems.
The model performs best in browsers but also shows strong results for mobile UIs, hinting at potential use cases in smartphone automation and app testing.
Safety and Responsible AI Use
Controlling a computer comes with serious safety risks. Google DeepMind emphasizes that the Gemini 2.5 Computer Use model is built with safety at its core.
The system has three main protection layers:
- Per-step safety checks before executing any action.
- System-level instructions allowing developers to block or confirm sensitive operations.
- Inbuilt safety training to avoid risky behaviors like bypassing security or handling private data.
Developers can configure these controls to ensure that agents act responsibly and avoid harm. Google encourages thorough testing before deploying any real-world applications.
Real-world Applications Already in Motion
Google has already integrated this model into several of its products and tools. Teams use it for UI testing, workflow automation, and even as a contingency mechanism for fragile test systems.
Early testers, including companies like Poke.com and Autotab, reported major improvements in task accuracy and speed. According to Autotab, Gemini 2.5 helped increase workflow performance by up to 18% on their toughest evaluations.
Google’s own payments platform team found that the model’s ability to “self-repair” failed workflows cut test failures by a significant margin.
The Bigger Picture
The launch of Gemini 2.5 Computer Use model signals a shift toward agentic AI—systems capable of independent, goal-driven action in digital environments.
By bridging the gap between reasoning and interface interaction, it sets the stage for smarter AI assistants that can manage online tasks without human help.
While still in preview, the model’s versatility could reshape how developers design automation tools, from personal assistants to enterprise-level testing systems.