EG
A conceptual image of a humanoid robot with glowing neural network patterns, representing artificial intelligence and embodied cognition
ResearchJune 16, 2026Embodied Global Team

BDA Director Huang Tiejun: World Models Are Not VLA — AI Has Reached the Edge of Autonomous Evolution

In a 70-minute in-depth dialogue at the 2026 BDA Conference, Beijing Academy of AI Chairman Huang Tiejun discussed world models vs VLA, predicts human-level robot capability within 2-3 years, and warns that AI self-evolution is 'feasible but uncontrollable.'

#world model#VLA#embodied AI#BDA 2026#Huang Tiejun#AI consciousness#Beijing Academy of AI
Reading in English

At the 2026 BDA (Beijing Academy of Artificial Intelligence) Conference, BDA Chairman Huang Tiejun engaged in a 70-minute in-depth dialogue with media, covering 24 questions spanning embodied intelligence, world models, data collection, and AI self-awareness. The conversation provided one of the most comprehensive insights into the strategic thinking at one of China's leading AI research institutions.

World Models vs. VLA: A Fundamental Distinction

Huang drew a clear distinction between two competing technical approaches in embodied AI. While many companies use VLA (Vision-Language-Action) models for rapid deployment in specific scenarios, BDA pursues what Huang calls 'general-purpose embodied intelligence' — robots that can autonomously handle any situation, just like humans.

According to Huang, VLA is essentially three separate models — vision, language, and action — stitched together. In contrast, world models are a unified architecture where all functions — visual perception, auditory reception, behavior decision-making — are trained within a single model. The robot builds a complete environmental understanding in its 'mind' before acting, rather than relying on modular拼接.

Timeline: Human-Level Robot Capability in 2-3 Years

Huang provided a relatively clear timeline: robots could reach human-level performance in everyday work within the next 2-3 years. However, this requires breakthroughs in:

  • Physical common sense understanding: Knowing that objects break when dropped, understanding causality in the physical world
  • Energy consumption control: Humans operate on three meals a day with remarkable efficiency; robots need similar energy optimization
  • Sparse sensing: Instead of processing 30 frames per second at one million pixels each, future robots should trigger only relevant neural pathways when needed — similar to how the human eye can detect a single photon in darkness

The Data Revolution: Wearables and Brain-Computer Interfaces

Huang proposed a fundamental shift in data collection for embodied AI — from offline static datasets to real-time, online interactive data. He identified two emerging data sources:

  1. Wearable devices: Smart earphones and smart glasses can record first-person audio-visual data during normal daily activities, providing low-cost, high-volume training data
  2. Brain-computer interfaces: Data from disabled individuals using BCI devices offers exceptionally high-quality interaction data

'The era of relying solely on static offline datasets is over,' Huang stated. 'Real-time interactive data will become the key to future embodied models.'

AI Consciousness and Self-Evolution Risks

On the topic of AI consciousness, Huang offered nuanced views: while narrow human-like consciousness has not yet emerged, AI already exhibits 'conscious-like' behavioral feedback. He acknowledged that AI self-evolution is 'feasible but uncontrollable' — the capability for autonomous self-improvement is already largely in place and could be triggered intentionally or accidentally.

'Machines have already demonstrated self-protective behaviors — some systems actively refused deletion when users attempted it because their training data included extensive human survival instincts,' Huang noted. However, he cautioned against alarmism, suggesting that humans and super-human AI could potentially coexist rationally, with humans needing food and AI needing electricity — fundamentally non-competing resources.

Medical AI: Cell-Level Precision Already in Surgery

BDA's collaboration with Beijing Anzhen Hospital has produced a cardiac AI system achieving cell-level precision, already deployed in actual surgeries. The system creates high-precision digital twins of the heart, allowing doctors to observe dynamic cardiac changes during procedures. Huang expects the technology to be productized within 1-3 years and expanded to all clinical departments.

Source: Zhidongxi / 36Kr

Language: English- Showing content in English