EG
Li Fei-Fei Team Proposes ENACT Benchmark to Reveal Human-AI Gap in Embodied Cognition
ResearchJune 21, 2026EG Editorial

Li Fei-Fei Team Proposes ENACT Benchmark to Reveal Human-AI Gap in Embodied Cognition

Stanford and Northwestern researchers introduce ENACT, a new benchmark evaluating embodied cognition in VLMs. Key findings: significant human-AI gap, models better at inverse than forward world modeling, and anthropocentric biases including right-hand preference.

Reading in English

Li Fei-Fei and Manling Li Team Propose ENACT: A New Benchmark to Evaluate Embodied Cognition in VLMs

A research team led by Stanford Professor Li Fei-Fei and Northwestern University Assistant Professor Manling Li has introduced ENACT, a novel benchmark designed to evaluate whether vision-language models (VLMs) exhibit embodied cognition — the ability to understand and reason about sensorimotor interaction in the physical world.

Published on arXiv (arXiv:2511.20937), ENACT reframes embodied cognition evaluation as world modeling from egocentric interaction, using a visual question answering (VQA) format. The benchmark comprises two core tasks: forward world modeling (reordering shuffled future observations given actions) and inverse world modeling (reordering shuffled actions given observations).

Key Findings

  • Significant human-AI gap: Frontier VLMs including GPT-5 and GLM-4.5V showed substantial performance gaps compared to humans, with the gap widening as interaction horizon increased.
  • Inverse > Forward: All tested models performed consistently better on inverse world modeling than forward tasks, suggesting current VLMs are better at explaining past actions than predicting future states.
  • Anthropocentric biases: Models exhibited a preference for right-handed actions and degraded performance when camera viewpoints deviated from human vision.

The team built a scalable data generation pipeline using the BEHAVIOR robotics simulator, creating 8,972 QA pairs covering 29 long-horizon home-scale activities. Research by Northwestern University, Stanford University, and UCLA.

Source: arXiv:2511.20937

Source: arXiv:2511.20937
Language: English- Showing content in English