H Company's new Holo2 model takes the lead in UI Localization

Back to Articles

Ramzi De Coster's avatar

Hamza Benchekroun's avatar

Aurélien Lac's avatar

Tony Wu's avatar

Pierre-Louis Cedoz's avatar

Kai Yuan's avatar

Mart Bakler's avatar

Antoine Bonnet's avatar

Aleix Cambray (H-AI)'s avatar

Ronan Riochet's avatar

Two months since releasing our first batch of Holo2 models, H Company is back with our largest UI localization model yet: Holo2-235B-A22B Preview. This model achieves a new State-of-the-Art (SOTA) record of 78.5% on Screenspot-Pro and 79.0% on OSWorld G.

Available on Hugging Face, Holo2-235B-A22B Preview is a research release focused on UI element localization.

benchmark_table_light (3)

Agentic Localization

High-resolution 4K interfaces are challenging for localization models. Small UI elements can be difficult to pinpoint on a large display. With agentic localization, however, Holo2 can iteratively refine its predictions, improving accuracy with each step and unlocking 10-20% relative gains across all Holo2 model sizes.

Holo2-235B-A22B's Performance on ScreenSpot-Pro

Holo2-235B-A22B Preview reaches 70.6% accuracy on ScreenSpot-Pro in a single step. In agent mode, it achieves 78.5% within 3 steps, setting a new state-of-the-art on the most challenging GUI grounding benchmark.

cost_perf_screenspot_pro_light (2)

Trained with SkyPilot

Training Holo2 models at scale requires coordinating workloads across multiple cloud providers. H Company uses SkyPilot as a unified interface for launching training jobs on our clusters with Kubernetes (k8s). By abstracting away infrastructure complexity, SkyPilot lets researchers focus on model development instead of managing k8s manifests or maintaining separate deployment scripts.