Researchers have introduced OMG (Omni-Modal Motion Generation), a comprehensive framework for universal humanoid robot control. The work addresses a fundamental challenge in robotics: enabling flexible, adaptive control that can leverage multiple types of input signals to generate natural, full-body motion.
The core architecture mirrors the structure of biological motor systems. The scalable brain module supports multi-modal condition inputs, including natural language commands, audio signals, and human reference motions. The reactive motion-tracking cerebellum module ensures precise execution of the generated motion plans. This hierarchical separation allows the system to be both flexible at the high level and precise at the low level.
The researchers developed a carefully designed data curation and annotation pipeline to obtain high-quality training data. The diffusion-based motion generation backbone supports language, audio, and human reference motion as conditional inputs. Experiments demonstrate that OMG achieves state-of-the-art performance as a full-modal full-body controller, exhibiting favorable model scaling behavior and efficient adaptation to new distributions and modalities.
