What is embodied intelligence?

Embodied intelligence (also called physical AI) refers to AI systems that can perceive, reason about, and interact with the physical world through a body or embodiment, such as humanoid robots, robotic arms, and autonomous vehicles.

What are humanoid robots?

Humanoid robots are robots designed to resemble and mimic the human body form and movements. They are being developed for applications in manufacturing, healthcare, hospitality, and domestic assistance.

How is China leading embodied AI innovation?

China has become a global leader in embodied AI through massive government investment, rapid deployment of humanoid robots in manufacturing, breakthroughs in dexterous manipulation, and a growing ecosystem of startups like AGIBOT and StarDust Era.

What content does Embodied Global cover?

Embodied Global covers the latest news, research breakthroughs, funding rounds, product launches, and industry trends in embodied intelligence, humanoid robotics, and physical AI, with a focus on Chinese innovations translated into English, Spanish, and French.

How often is Embodied Global updated?

Embodied Global is updated daily with the latest news and developments in embodied intelligence. Articles are published in three languages simultaneously to reach a global audience.

PhysTool-Bench Reveals Critical Bottleneck: Even the Best MLLMs Fail at Physical Tool Use

The Hidden Bottleneck in Embodied AI

Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in using digital APIs — booking flights, querying databases, and navigating the web. However, a new benchmark reveals a startling gap when these models are asked to interact with the physical world.

PhysTool-Bench, introduced by researchers from Singapore Management University and The Hong Kong Polytechnic University in a paper published on arXiv (2606.10803) on June 9, 2026, is the first comprehensive benchmark designed to evaluate MLLMs' ability to recognize, select, and plan the use of physical tools in real-world scenarios.

The Benchmark

PhysTool-Bench comprises 2,510 queries over 2,678 real-world physical tools spanning diverse domains including manufacturing, electrical work, agriculture, and healthcare. Models are evaluated along two primary dimensions:

Task I - Physical Tool Recognition: Identifying all tools present in a scene
Task II - Tool Selection and Action Planning: Selecting the correct tools and placing them in the right execution order based on an instruction

Findings: A Two-Level Deficit

Across 13 leading MLLMs, the results were sobering:

Model	Tool Recognition (Task I)	End-to-End Completion (Task II)
Gemini-3.1-Pro (best)	58.7%	21.0%
GPT-5.4	~52%	~16%
Claude 4.5 Opus	~48%	~14%
Other models	30-50%	5-15%

Even the strongest model — Gemini-3.1-Pro — failed to identify nearly half of all tools in a scene and completed only one-fifth of end-to-end queries.

The Core Problem: Functional Commonsense

The researchers' analysis reveals two distinct deficits:

Perception Deficit: MLLMs struggle to perceive tools in realistic, cluttered scenes — a relatively smaller gap
Functional Commonsense Deficit: The far larger drop occurs at the planning stage, where models fail to map perceived tools onto task semantics. Even when models correctly see a hammer, they may not understand it's the right tool for driving a nail.

This "functional commonsense" gap — the ability to connect visual recognition with practical task semantics — is identified as the central bottleneck for practical embodied AI deployment.

Implications for Embodied AI

While MLLMs increasingly serve as the "brain" of embodied AI systems, instructing robots to interact with the physical world, this research shows that the path from digital tool mastery to physical world competence is far from complete. The findings suggest that future embodied AI research must focus not just on larger models or more data, but on bridging this fundamental "functional commonsense" gap — teaching AI systems to understand not just what tools look like, but what they do and how they should be used in context.

Paper: arXiv:2606.10803 - "Beyond APIs: Probing the Limits of MLLMs in Physical Tool Use" Authors: Zhixin Ma, Yutong Zhou, Yongqi Li, Chong-Wah Ngo, Wenjie Li

Language: English- Showing content in English

Trending Now

Industry

LG CNS and LX Pantos Partner to Build Next-Generation Unmanned Warehouse with Humanoid Robots

Jun 11, 2026 · 0 views

Research

X Square Robot Open-Sources XRZero-G0 Framework for Scalable Robot Learning

Jun 10, 2026 · 0 views

Research

Human Archive Raises $8.2M to Train Robots Using India's Gig Economy Workers

Jun 7, 2026 · 0 views

Industry

The Economist Highlights Ningbo as the Unlikely Heart of Global Humanoid Robot Component Supply Chain

Jun 7, 2026 · 0 views

View full leaderboard

More in Research

Research

BAAI Unveils Physis-v0.1: World's First General World Foundation Model for Embodied AI

Jun 14, 2026

Research

CVPR 2026 Best Papers Signal the Rise of Embodied AI: D4RT, NitroGen, and SAM 3D Lead the Way

Jun 14, 2026

Research

China Team Open-Sources Embodied-R1.5: 8B Model Achieves SOTA on 16 of 24 Embodied AI Benchmarks

Jun 13, 2026

GenHOI: Zero-Shot Humanoid-Object Interaction by Imitating Generated Videos MIT Ultrasound Wristband Tracks Every Finger Movement, Controls Robot Hand in Real Time Generalist AI Unveils GEN-0: Embodied Foundation Model That Scales with Physical Interaction

Share this article

Twitter LinkedIn Facebook

PhysTool-Bench Reveals Critical Bottleneck: Even the Best MLLMs Fail at Physical Tool Use

The Hidden Bottleneck in Embodied AI

The Benchmark

Findings: A Two-Level Deficit

The Core Problem: Functional Commonsense

Implications for Embodied AI

Trending Now

LG CNS and LX Pantos Partner to Build Next-Generation Unmanned Warehouse with Humanoid Robots

X Square Robot Open-Sources XRZero-G0 Framework for Scalable Robot Learning

Human Archive Raises $8.2M to Train Robots Using India's Gig Economy Workers

The Economist Highlights Ningbo as the Unlikely Heart of Global Humanoid Robot Component Supply Chain

More in Research

BAAI Unveils Physis-v0.1: World's First General World Foundation Model for Embodied AI

CVPR 2026 Best Papers Signal the Rise of Embodied AI: D4RT, NitroGen, and SAM 3D Lead the Way

China Team Open-Sources Embodied-R1.5: 8B Model Achieves SOTA on 16 of 24 Embodied AI Benchmarks

Stay Updated

Share this article

Comments