Tencent Open-Sources HyVLA-0.5: A Full-Stack Embodied VLA System
On June 15, 2026, Tencent Robotics X, Futian Laboratory, and the Hunyuan team jointly released HyVLA-0.5 (Hy-Embodied-0.5-VLA), an end-to-end embodied intelligence model for real-world robotic manipulation tasks. The entire system — model weights, training code, and datasets — is open source.
The Data Advantage: 10,000 Hours of Sub-mm Precision
Tencent developed a proprietary high-precision finger-sleeve UMI (Universal Manipulation Interface) data collection system:
- Sub-millimeter 6-DoF trajectory accuracy via optical motion capture
- Integrated force/torque sensing at fingertips
- Over 10,000 hours of first-person demonstration data
- 1 million+ episodes covering 70 task categories (kitchen, laundry, storage, cleaning, tool use, flexible object manipulation)
- Dataset named Hy-UMI-10K (2,000 hours open-sourced)
Model Architecture
HyVLA-0.5 extends the Hy-Embodied-0.5 vision-language model to robotics control:
- Flow matching action expert module for continuous trajectory generation
- Dual-tower structure decoupling vision-language understanding from action generation
- Compact memory encoder compressing multi-frame visual history
- Incremental action representation (rel-EE) decoupling from specific robot kinematics for cross-embodiment transfer
Training Pipeline
The system follows a four-stage pipeline:
- Continuous pre-training on Hy-UMI-10K — learning general action priors
- Supervised fine-tuning — Track A (target robot) and Track B (UMI-only cross-embodiment, no teleoperation)
- FlowPRO reinforcement learning — intervention-rollback pipeline with RPRO preference loss for reward-free offline RL
- High-frequency asynchronous inference with Bezier curve smoothing
Benchmark Results
On RoboTwin 2.0 (50 complex dual-arm tasks):
- Over 90% success rate in both Clean and Randomized settings
- SOTA among open-source VLA models
- Surpassed π0.5, LingBot-VLA, and Motus
With FlowPRO RL, real-robot success rates approach near 100% on deployed tasks. The system has also been deployed in a real cosmetics factory production line.
Significance
By open-sourcing the full stack — hardware design, dataset, model weights, and training code — Tencent is lowering barriers for embodied AI research. The use of pure UMI data (no target robot teleoperation) for cross-embodiment transfer is particularly significant, suggesting high-quality human demonstration data can substitute for expensive robot-specific data collection.


