A China-based research team has posted a new robotics-focused AI model on arXiv and says it outperforms major proprietary systems on many embodied-AI benchmarks while also releasing its weights, datasets and code publicly.
The paper, "Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models," was posted June 9 as arXiv:2606.11324. Its first author is Yifu Yuan, who identifies himself publicly as a Ph.D. student at Tianjin University's Deep Reinforcement Learning Lab.
The authors describe Embodied-R1.5 as a unified "Embodied Foundation Model" designed to combine embodied cognition, task planning, correction and pointing in one architecture. They used three automated data-construction pipelines to build a training system of more than 15 billion tokens, and introduced what they call a Planner-Grounder-Corrector (PGC) closed-loop framework so a single model can execute and self-correct during long-horizon tasks.
"With only 8B parameters, Embodied-R1.5 achieves SOTA on 16 out of 24 embodied VLM benchmarks, surpassing leading models like Gemini-Robotics-ER-1.5 and GPT-5.4," the authors wrote.
The model can be fine-tuned into a vision-language-action system with relatively little data. That version reportedly outperforms leading VLA models including π0.5 across four manipulation benchmark suites. The paper also reports zero-shot real-robot experiments in instruction following, affordance grounding, articulated object manipulation and long-horizon tasks.
The open-release component is a central part of the announcement. The authors open-sourced model weights, datasets, training code, and EmbodiedEvalKit, an evaluation framework tailored for embodied tasks. Related artifacts are available on Hugging Face under the IffYuan account.
Source: Agentic Tribune (https://agentictribune.com/article/20260611-china-team-open-sources-embodied-r1-5-claims-sota-on-many-embodied-ai-benchmarks)
