Galbot and Tsinghua University researchers have unveiled Humanoid-GPT, a groundbreaking approach to humanoid robot motion tracking that leverages the power of large-scale data and modern transformer architecture.
The Scaling Revolution in Motion Tracking
Humanoid-GPT is pre-trained on a 2 billion frame retargeted motion corpus, unifying all major motion capture datasets including Lafan1, AMASS, Motion-X++, PHUMA, and MotionMillion, plus large-scale in-house recordings. This represents over 200× larger than prior tracker training sets.
Breaking the Agility-Generalization Trade-off
Prior motion trackers suffer from an agility-generalization trade-off: trackers excelling on agile motions often break on unseen styles, while generalizing trackers underfit complex dynamics. Humanoid-GPT breaks this trade-off through systematic scaling.
Key Results
- Zero-shot generalization to unseen motions and control tasks
- High-dynamic behavior tracking with unprecedented precision
- Single generative Transformer replacing multiple specialized trackers
- 96%+ success rate on real robot deployment tests
Architecture Highlights
The model uses GPT-style causal attention for online tracking deployment constraints, ensuring real-time performance. The scalable transformer architecture continues to improve with data and model scale, unlike shallow MLPs.
Open Source
Code and project page available at: https://github.com/GalaxyGeneralRobotics/Humanoid-GPT/

