EG
EgoEMG data collection system showing bilateral EMG wristbands, head-mounted first-person RGB camera, external RGB-D camera, and optical motion capture markers for hand pose annotation
ResearchJune 18, 2026Embodied Global Team

Tsinghua University and Shouyi Tech Release EgoEMG: A Breakthrough Multimodal Hand Pose Dataset for Embodied AI's 'Last Centimeter'

Tsinghua University's Department of Automation and Shouyi Tech jointly release EgoEMG, the industry's first multimodal egocentric hand pose dataset synchronizing EMG, vision, depth, and motion data. The dataset provides a unified benchmark proving vision-dominant multimodal fusion is the optimal path for precise hand perception.

#EgoEMG#Tsinghua University#Shouyi Technology#Hand Pose Estimation#EMG#Multimodal Dataset#Dexterous Manipulation#Embodied AI#arXiv
Reading in English

A research team from Tsinghua University's Department of Automation, in collaboration with Shouyi Technology, has released EgoEMG — the industry's first multimodal egocentric dataset that simultaneously provides EMG (surface electromyography), vision, depth, and motion data for hand pose estimation, all time-synchronized under a unified protocol.

The dataset and accompanying paper (arXiv:2605.05712) address a critical gap in embodied intelligence: the lack of a unified benchmark to compare vision-based and EMG-based approaches to hand perception. This 'last centimeter' challenge is fundamental to enabling robots to perform dexterous manipulation tasks.

EgoEMG features 41 participants, 10+ hours of synchronized multimodal data, and a learning-based markers2mano pipeline that reduces invalid frame rates from 12.7% (Meta's EMG2Pose baseline) to 3.6%, with a MANO-to-marker alignment error of just 4.3mm.

Key findings from the research:

  1. Pure EMG error is 2.4x greater than pure vision — a structural limitation of signal information density
  2. Cross-user generalization remains a fundamental challenge for pure EMG approaches
  3. Vision-dominant multimodal fusion achieves the best results, with EMG serving as a complementary modality for occluded scenarios

The team also designed EMGFormer, a novel architecture for EMG-to-pose estimation that achieves 22% improvement over the previous state-of-the-art on the hardest generalization subset. A residual fusion architecture was introduced for EMG+vision integration, where the EMG branch learns only what vision cannot see — such as finger bending during occlusion.

EgoEMG establishes three benchmark tasks — EMG→pose, vision→pose, and EMG+vision fusion — providing a standardized evaluation protocol that positions vision-dominant multimodal fusion as the most promising path forward for precise hand perception in embodied AI.

Language: English- Showing content in English

Share this article