Integrating OSWorld into Harbor, enabling evaluation of computer use agents on real Ubuntu and Windows desktops running in sandboxes at scale.
Integrating OSWorld into Harbor, enabling evaluation of computer use agents on real Ubuntu and Windows desktops running in sandboxes at scale.