EG
World Pilot framework diagram showing how Latent Steering and Action Steering integrate World-Action Model priors into VLA decision chain for robotic manipulation
ResearchJune 15, 2026Embodied Global Team

World Pilot: Steering Vision-Language-Action Models with World-Action Priors Achieves 84.7% on LIBERO-Plus

Researchers from CASIA, Nanjing University, and Beihang University propose World Pilot, a VLA framework augmented with World-Action Model priors. It achieves 84.7% total success rate on LIBERO-Plus zero-shot OOD benchmark and highest success rates across four real-robot manipulation tasks.

#VLA#World Model#Robotic Manipulation#LIBERO-Plus#World Action Model#CASIA
Reading in English

A team of researchers from the Institute of Automation, Chinese Academy of Sciences (CASIA), Nanjing University, and Beihang University has introduced World Pilot, a novel Vision-Language-Action (VLA) framework that integrates priors from a World-Action Model (WAM) to significantly improve robotic manipulation performance.

Traditional VLA models inherit semantic grounding from large-scale pretraining on static image-text pairs, but manipulation tasks involve continuous, contact-rich dynamics that such pretraining cannot capture. World Pilot addresses this fundamental limitation by routing WAM priors into the VLA decision chain through two complementary pathways.

Latent Steering conditions the perception layer on a scene-evolution latent, giving the model an anticipated understanding of how the environment will change. Action Steering supplies an anticipated trajectory as a motion prior to the action generator. Together, these two priors equip the VLA with both a predictive view of the scene and trajectory-level motion hints alongside its semantic conditioning.

A key finding is that the scene-evolution prior remains effective even when supplied by a video-pretrained world model that has not undergone action post-training, significantly broadening the applicability of the approach.

World Pilot achieves a state-of-the-art total success rate of 84.7% on the LIBERO-Plus zero-shot out-of-distribution benchmark, outperforming prior methods including Cosmos Policy (79.7%) and Being-H0 (82.1%). In real-robot evaluations across four manipulation tasks, it achieves the highest success rates, with the largest performance margins observed under shifts in viewpoint, geometry, deformable state, and pose.

The project website, code, and model weights have been released to facilitate further research. This work demonstrates that incorporating world-action priors into VLA models is a highly effective paradigm for improving generalization and robustness in robotic manipulation.

Language: English- Showing content in English