EG
Tencent HyVLA-0.5 system architecture diagram showing the end-to-end pipeline from UMI data collection to real robot deployment
ResearchJune 15, 2026Embodied Global Team

Tencent Open-Sources HyVLA-0.5: End-to-End VLA Model Trained on 10,000 Hours of Sub-mm Precision Demo Data

Tencent Robotics X releases HyVLA-0.5, an open-source end-to-end embodied VLA system with 10,000+ hours of sub-mm precision UMI data, achieving SOTA on RoboTwin 2.0 benchmark with FlowPRO reinforcement learning.

#Tencent#VLA#open source#robotic manipulation#UMI#HyVLA#embodied AI#flow matching#RoboTwin
Reading in English

Tencent Open-Sources HyVLA-0.5: A Full-Stack Embodied VLA System

On June 15, 2026, Tencent Robotics X, Futian Laboratory, and the Hunyuan team jointly released HyVLA-0.5 (Hy-Embodied-0.5-VLA), an end-to-end embodied intelligence model for real-world robotic manipulation tasks. The entire system — model weights, training code, and datasets — is open source.

The Data Advantage: 10,000 Hours of Sub-mm Precision

Tencent developed a proprietary high-precision finger-sleeve UMI (Universal Manipulation Interface) data collection system:

  • Sub-millimeter 6-DoF trajectory accuracy via optical motion capture
  • Integrated force/torque sensing at fingertips
  • Over 10,000 hours of first-person demonstration data
  • 1 million+ episodes covering 70 task categories (kitchen, laundry, storage, cleaning, tool use, flexible object manipulation)
  • Dataset named Hy-UMI-10K (2,000 hours open-sourced)

Model Architecture

HyVLA-0.5 extends the Hy-Embodied-0.5 vision-language model to robotics control:

  • Flow matching action expert module for continuous trajectory generation
  • Dual-tower structure decoupling vision-language understanding from action generation
  • Compact memory encoder compressing multi-frame visual history
  • Incremental action representation (rel-EE) decoupling from specific robot kinematics for cross-embodiment transfer

Training Pipeline

The system follows a four-stage pipeline:

  1. Continuous pre-training on Hy-UMI-10K — learning general action priors
  2. Supervised fine-tuning — Track A (target robot) and Track B (UMI-only cross-embodiment, no teleoperation)
  3. FlowPRO reinforcement learning — intervention-rollback pipeline with RPRO preference loss for reward-free offline RL
  4. High-frequency asynchronous inference with Bezier curve smoothing

Benchmark Results

On RoboTwin 2.0 (50 complex dual-arm tasks):

  • Over 90% success rate in both Clean and Randomized settings
  • SOTA among open-source VLA models
  • Surpassed π0.5, LingBot-VLA, and Motus

With FlowPRO RL, real-robot success rates approach near 100% on deployed tasks. The system has also been deployed in a real cosmetics factory production line.

Significance

By open-sourcing the full stack — hardware design, dataset, model weights, and training code — Tencent is lowering barriers for embodied AI research. The use of pure UMI data (no target robot teleoperation) for cross-embodiment transfer is particularly significant, suggesting high-quality human demonstration data can substitute for expensive robot-specific data collection.

Language: English- Showing content in English