EG
Embodied-Reasoner framework showing reasoning and interaction trajectory in AI2-THOR simulator environment
ResearchJune 18, 2026Stax

Embodied-Reasoner: ZJU, CAS and Alibaba Open-Source Reasoning Model That Outperforms OpenAI o1

A joint team from Zhejiang University, Chinese Academy of Sciences, and Alibaba Damo Academy has open-sourced Embodied-Reasoner, a multimodal embodied reasoning model achieving 80.96% task success rate in AI2-THOR, surpassing OpenAI o1 (71.73%), o3-mini (56.55%), and Claude-3.7 (67.70%). The model features advanced visual search, spatial reasoning, and self-correction capabilities.

#embodied reasoning#open source#AI2-THOR#deep thinking#Zhejiang University#Alibaba#CAS
Reading in English

Open-Source Breakthrough in Embodied Reasoning

A joint research team from Zhejiang University's College of Computer Science, the Institute of Software at the Chinese Academy of Sciences (CAS), and Alibaba Group's DAMO Academy has released Embodied-Reasoner, a fully open-source multimodal embodied reasoning model that brings o1-style deep thinking to interactive physical tasks.

Outperforming Industry Giants

In comprehensive evaluations across 809 test cases in the AI2-THOR simulator, Embodied-Reasoner (7B) achieved:

  • 80.96% task success rate vs 71.73% (OpenAI o1), 56.55% (o3-mini), 67.70% (Claude-3.7)
  • 55.07% search efficiency, the highest among all tested models
  • 86.30% task completeness, outperforming all competitors
  • 54.29% success rate on composite multi-step tasks, nearly 4x better than o3-mini

Three-Stage Training Pipeline

The model's superior performance stems from an innovative three-stage training approach:

  1. Imitation Learning: Fine-tuning on 9,300 synthesized Observation-Thought-Action trajectories (64K images, 8M thought tokens) covering 107 indoor scenes.
  2. Self-Exploration (Rejection Sampling): The model generates multiple trajectories on novel tasks and uses successful ones to enhance exploration abilities.
  3. Self-Correction (Reflection Tuning): By injecting anomalous states and reflective thinking, the model learns to detect and correct its own errors.

Real-World Validation

Beyond simulation, the team validated Embodied-Reasoner in real-world object search tasks across kitchen, bathroom, and bedroom scenes. The model demonstrated consistent spatial reasoning and efficient search behavior, avoiding the repetitive searches and logical inconsistencies observed in OpenAI o3-mini.

Open Availability

Embodied-Reasoner is available in 2B and 7B parameter versions, with the complete training dataset and codebase released on GitHub and Hugging Face.

Paper: arXiv:2503.21696 | Code: https://github.com/zwq2018/embodied_reasoner

Language: English- Showing content in English