EG
AI neural network visualization with glowing blue connections representing deep learning and artificial intelligence research
ResearchJune 13, 2026Embodied Global Team

China Team Open-Sources Embodied-R1.5: 8B Model Achieves SOTA on 16 of 24 Embodied AI Benchmarks

Tianjin University-led team open-sources Embodied-R1.5, an 8B-parameter embodied foundation model surpassing Gemini-Robotics-ER-1.5 and GPT-5.4 on 16/24 embodied VLM benchmarks, with full weights, datasets, and training code released.

#embodied-ai#research#open-source#arxiv#foundation-model
Reading in English

A China-based research team has released Embodied-R1.5, a unified Embodied Foundation Model (EFM) that integrates comprehensive embodied reasoning capabilities within a single architecture, achieving state-of-the-art results on the majority of major embodied AI benchmarks.

The paper, titled "Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models," was posted on arXiv on June 9 (arXiv:2606.11324). Led by first author Yifu Yuan, a Ph.D. student at Tianjin University's Deep Reinforcement Learning Lab, the work represents a significant step forward in open embodied AI research.

Key Technical Innovations

Embodied-R1.5 introduces three automated data construction pipelines that generated a training dataset of over 15 billion tokens, significantly expanding data coverage for critical embodied capabilities. The team designed a multi-task balanced reinforcement learning recipe to alleviate heterogeneous task conflicts during training.

A standout feature is the Planner-Grounder-Corrector (PGC) closed-loop framework, which enables a single model to autonomously execute and self-correct over long-horizon tasks without human intervention.

Benchmark Performance

With only 8 billion parameters, Embodied-R1.5 achieves SOTA (State-of-the-Art) on 16 out of 24 embodied VLM (Vision-Language Model) benchmarks, surpassing leading proprietary models including Google DeepMind's Gemini-Robotics-ER-1.5 and OpenAI's GPT-5.4.

When fine-tuned into a Vision-Language-Action (VLA) model, it outperforms leading VLA models including pi0.5 across four popular manipulation benchmark suites. The team also conducted extensive zero-shot real-robot experiments, validating performance in instruction following, affordance grounding, articulated object manipulation, and long-horizon complex tasks.

Full Open-Source Release

In a notable departure from typical closed-source embodied AI research, the team has open-sourced model weights, datasets, training code, and EmbodiedEvalKit — an evaluation framework tailored for embodied tasks — on Hugging Face and GitHub. This is the successor to Embodied-R1, which was accepted at ICLR 2026.

The open release lowers barriers for researchers worldwide working on robot reasoning and manipulation, providing an accessible platform for advancing embodied AI research.

Language: English- Showing content in English