EG
A person holding a smartphone capturing video footage for embodied AI data collection, representing Ant Group AoE system
ResearchJune 14, 2026Embodied Global

Ant Group Tianji Lab Unveils AoE: A Scalable System for Egocentric Embodied AI Data Collection Using Consumer Phones

Ant Group's AoE system transforms consumer smartphones into embodied AI data collectors using a light frontend, heavy backend design, achieving 20x cost reduction with 10:1 robot-free to real-robot data ratio.

#Ant Group#Tianji Lab#AoE#data collection#egocentric video
Reading in English

Ant Group's Tianji Lab has unveiled AoE (Always-on Egocentric Human Video Collection for Embodied AI), a comprehensive system designed to solve the critical data bottleneck hindering embodied AI development. The system transforms everyday consumer smartphones into powerful embodied data collection devices.

The core challenge in embodied AI today is the scarcity of high-quality training data. Traditional approaches face fundamental limitations:

  • Physical Robot Teleoperation: High precision but extremely expensive, with typical data collection facilities producing only 10,000 to 100,000 hours annually
  • Handheld Grippers (UMI-style): Lower hardware costs but still require dedicated operators, limiting scalability
  • First-Person Video: Extremely low cost but lacks action trajectories (3D hand poses, camera poses)

AoE takes the third approach but industrializes the entire post-processing pipeline to transform raw video into usable training data.

Design Philosophy: Light Frontend, Heavy Backend

The frontend is deliberately kept simple and affordable:

  • Device: User's own smartphone plus a 2.80 dollar neck-mounted bracket
  • Personnel: Real workers in their actual job positions (car washers, mechanics, chefs, cashiers)
  • Collection: App automatically detects hand-object interactions, no manual start or stop required

The backend uses sophisticated algorithms to fill in missing trajectories:

  • 3D hand reconstruction from monocular video (MANO parameters)
  • 6DoF camera trajectory estimation (SLAM plus depth priors)
  • Action semantic annotation via multimodal LLMs
  • Triple quality inspection (edge-side plus cloud plus human sampling)

Key Technical Innovations: Data Map plus Task Distribution system, Automated Quality Inspection Flywheel, and Heterogeneous Device Adaptation.

Results and Impact: Combining approximately 10 robot-free demonstrations with 1 real-robot demonstration delivers comparable performance to models trained entirely on physical robot data, reducing real-robot data needs by up to 20 times. The system offers over 2,000 hours of validated multimodal demonstrations with strong zero-shot transfer capabilities across different robot platforms.

Source: Ant Group Tianji Lab / arXiv
Language: English- Showing content in English