Alibaba officially launched the Qwen-Robot series on June 16, 2026, marking its first complete embodied intelligence model family within the Qwen large model ecosystem. The series comprises three core models: Qwen-RobotManip (VLA manipulation model), Qwen-RobotNav (VLN navigation model), and Qwen-RobotWorld (world model), designed to provide a universal intelligent foundation for robots of different morphologies.
Qwen-RobotManip uses an 80-dimensional unified action representation, enabling cross-platform generalization by learning fundamental physics and manipulation logic rather than task-specific motion sequences. The model was pre-trained on over 38,000 hours of open-source data and achieved top-two rankings on the RoboChallenge Table30 v1 benchmark across 30 real-world tasks and 4 robot platforms, including high-difficulty operations like faucet twisting, network cable plugging, and dual-arm french fry pouring.
Qwen-RobotNav integrates five major task families (language-guided navigation, target search, autonomous driving) into a single framework with task-adaptive observation mechanisms. It supports multiple agent frameworks natively and has been demonstrated on Unitree Go2 quadruped robots performing autonomous object search and retrieval.
Qwen-RobotWorld serves as a physics-informed world model that can simulate and predict future robot states and actions, enabling pre-execution trajectory rehearsal and synthetic data generation for training. The three models can be deployed independently or synergistically, allowing robots to achieve 'walk, see, and think' capabilities simultaneously.
Industry observers view the launch as a significant step in extending large language model capabilities from the digital world into physical robot deployment, with standardized interfaces that could reduce cross-platform migration costs for embodied intelligence systems.

