What is embodied intelligence?

Embodied intelligence (also called physical AI) refers to AI systems that can perceive, reason about, and interact with the physical world through a body or embodiment, such as humanoid robots, robotic arms, and autonomous vehicles.

What are humanoid robots?

Humanoid robots are robots designed to resemble and mimic the human body form and movements. They are being developed for applications in manufacturing, healthcare, hospitality, and domestic assistance.

How is China leading embodied AI innovation?

China has become a global leader in embodied AI through massive government investment, rapid deployment of humanoid robots in manufacturing, breakthroughs in dexterous manipulation, and a growing ecosystem of startups like AGIBOT and StarDust Era.

What content does Embodied Global cover?

Embodied Global covers the latest news, research breakthroughs, funding rounds, product launches, and industry trends in embodied intelligence, humanoid robotics, and physical AI, with a focus on Chinese innovations translated into English, Spanish, and French.

How often is Embodied Global updated?

Embodied Global is updated daily with the latest news and developments in embodied intelligence. Articles are published in three languages simultaneously to reach a global audience.

Holi-Spatial: ICML 2026 Oral — Fully Automated 3D Spatial Intelligence Data Pipeline with 4M-Scale Dataset

A research team from Shanghai Artificial Intelligence Laboratory, Northwestern Polytechnical University (NWPU), and Shanghai Jiao Tong University (SJTU) has introduced Holi-Spatial, a fully automated framework for constructing 3D spatial intelligence training data from raw video streams. The paper has been accepted as an Oral presentation at ICML 2026.

The Data Bottleneck in Spatial Intelligence While large language models have rapidly advanced in image understanding, OCR, multi-image reasoning, and video QA, they still struggle with genuine 3D spatial understanding. Capabilities like understanding object spatial relationships, camera movement estimation, and cross-view object localization require large-scale, fine-grained, geometrically constrained 3D data — a resource that has been scarce and expensive to produce.

Traditional approaches rely on manually annotated 3D datasets like ScanNet and ScanNet++, which are limited in scale and domain coverage. Holi-Spatial addresses this bottleneck by turning publicly available video data into structured spatial supervision automatically.

Three-Stage Automated Pipeline Holi-Spatial operates through a three-stage pipeline: Stage 1 — Geometric Optimization using 3D Gaussian Splatting for multi-view consistent depth and point cloud recovery. Stage 2 — Open-Vocabulary Perception using VLM-generated categories and SAM3 segmentation masks back-projected into 3D. Stage 3 — Scene-Level Refinement including multi-view merging, confidence filtering, VLM agent verification, and QA generation.

The pipeline produced Holi-Spatial-4M, a dataset containing over 4 million spatial annotations spanning 3D grounding, spatial QA, instance segmentation, and 3D detection across ScanNet, ScanNet++, and DL3DV-10K sources.

Performance and Impact Experimental results demonstrate significant quality gains. On ScanNet++, depth F1 reaches 0.89, 2D segmentation IoU reaches 0.64, and 3D detection AP25/AP50 reaches 81.06/70.05. When fine-tuned on Qwen3-VL-8B, the dataset boosts 3D grounding AP50 from 13.50 to 27.98 — a 14.48 AP point improvement.

Holi-Spatial demonstrates that raw video can be automatically converted into structured, trainable spatial intelligence data, suggesting that future improvements in spatial AI may come as much from better data systems as from larger model parameters. This has profound implications for embodied AI, AR/VR, robotics navigation, and scene understanding applications.

Paper: https://arxiv.org/abs/2603.07660 | Project: https://visionary-laboratory.github.io/holi-spatial/ | Code: https://github.com/Visionary-Laboratory/Holi-Spatial

Language: English- Showing content in English

Trending Now

Industry

2026: The Year of Embodied AI Mass Production - 34.5 Billion Yuan in Funding and State Grid's 6.8 Billion Procurement

May 8, 2026 · 493 views

Research

Top 10 Embodied AI Advances (2025-2026) Released: China's Humanoid Robots Enter 'Work Mode' Era

Jun 19, 2026 · 293 views

Funding

China Q1 2026 Embodied AI Funding Reaches 556 Billion Yuan

Apr 29, 2026 · 214 views

Funding

Amazon, Toyota & GXO Deploy Agility Digit in Warehouses — Data

Apr 28, 2026 · 190 views

View full leaderboard

More in Research

Research

MemoryWAM: New World Action Model with Persistent Memory Achieves Breakthrough in Long-Horizon Robot Manipulation

Jun 21, 2026

Research

Aether AI Raises $20M to Build Causal World Models for Embodied Intelligence — A New Paradigm Beyond Scaling Laws

Jun 21, 2026

Research

Galaxy General Launches AstraBrain-WBC 0.5: The World's First Humanoid 'Cerebellum' GPT Foundation Model

Jun 21, 2026

CAAI Releases Embodied Intelligence White Paper 2026: Comprehensive Technical Framework Signals Industry Maturation Alibaba Qwen-RobotWorld: A Unified Language-Conditioned Video World Model for Embodied Intelligence Embodied AI Research Wins AAAI 2026 Distinguished Paper Award — A Historic First for the Field

Share this article

Twitter LinkedIn Facebook

Holi-Spatial: ICML 2026 Oral — Fully Automated 3D Spatial Intelligence Data Pipeline with 4M-Scale Dataset

Trending Now

2026: The Year of Embodied AI Mass Production - 34.5 Billion Yuan in Funding and State Grid's 6.8 Billion Procurement

Top 10 Embodied AI Advances (2025-2026) Released: China's Humanoid Robots Enter 'Work Mode' Era

China Q1 2026 Embodied AI Funding Reaches 556 Billion Yuan

Amazon, Toyota & GXO Deploy Agility Digit in Warehouses — Data

More in Research

MemoryWAM: New World Action Model with Persistent Memory Achieves Breakthrough in Long-Horizon Robot Manipulation

Aether AI Raises $20M to Build Causal World Models for Embodied Intelligence — A New Paradigm Beyond Scaling Laws

Galaxy General Launches AstraBrain-WBC 0.5: The World's First Humanoid 'Cerebellum' GPT Foundation Model

Stay Updated

Share this article

Comments