EG
A humanoid robot performing complex manipulation tasks in a real-world environment
ResearchJune 9, 2026Embodied Global Team

VAIC: Vision-Guided Humanoid Robot Agile Object Interaction Control via Decoupled Commands

VAIC introduces a two-stage distillation framework enabling humanoid robots to perform diverse dynamic tasks like box carrying, cart interaction, and skateboarding using only onboard depth sensors and decoupled velocity commands.

Reading in English

Researchers have introduced VAIC (Vision Guided Agile Interaction Control), a unified framework enabling humanoid robots to perform diverse dynamic tasks in unstructured real-world environments.

The core innovation lies in VAIC's two-stage distillation paradigm. First, a privileged teacher policy masters diverse interaction skills using precise object kinematics and exact environmental states. Second, a deployable student policy distills these capabilities by replacing full body tracking with velocity targets across multiple axes and an interaction indicator for each frame.

Key to the system's success is its recurrent object adaptation module, which implicitly infers unobservable object dynamics from raw depth streams and proprioception—without requiring perfect state observability.

Real-world evaluations demonstrate that a single VAIC policy successfully executes highly diverse dynamic tasks including box carrying, cart interaction, and skateboarding, consistently outperforming baseline approaches and advancing autonomous humanoid deployment.