At the 2026 BDA (Beijing Academy of Artificial Intelligence) Conference, BDA Chairman Huang Tiejun engaged in a 70-minute in-depth dialogue with media, covering 24 questions spanning embodied intelligence, world models, data collection, and AI self-awareness. The conversation provided one of the most comprehensive insights into the strategic thinking at one of China's leading AI research institutions.
World Models vs. VLA: A Fundamental Distinction
Huang drew a clear distinction between two competing technical approaches in embodied AI. While many companies use VLA (Vision-Language-Action) models for rapid deployment in specific scenarios, BDA pursues what Huang calls 'general-purpose embodied intelligence' — robots that can autonomously handle any situation, just like humans.
According to Huang, VLA is essentially three separate models — vision, language, and action — stitched together. In contrast, world models are a unified architecture where all functions — visual perception, auditory reception, behavior decision-making — are trained within a single model. The robot builds a complete environmental understanding in its 'mind' before acting, rather than relying on modular拼接.
Timeline: Human-Level Robot Capability in 2-3 Years
Huang provided a relatively clear timeline: robots could reach human-level performance in everyday work within the next 2-3 years. However, this requires breakthroughs in:
- Physical common sense understanding: Knowing that objects break when dropped, understanding causality in the physical world
- Energy consumption control: Humans operate on three meals a day with remarkable efficiency; robots need similar energy optimization
- Sparse sensing: Instead of processing 30 frames per second at one million pixels each, future robots should trigger only relevant neural pathways when needed — similar to how the human eye can detect a single photon in darkness
The Data Revolution: Wearables and Brain-Computer Interfaces
Huang proposed a fundamental shift in data collection for embodied AI — from offline static datasets to real-time, online interactive data. He identified two emerging data sources:
- Wearable devices: Smart earphones and smart glasses can record first-person audio-visual data during normal daily activities, providing low-cost, high-volume training data
- Brain-computer interfaces: Data from disabled individuals using BCI devices offers exceptionally high-quality interaction data
'The era of relying solely on static offline datasets is over,' Huang stated. 'Real-time interactive data will become the key to future embodied models.'
AI Consciousness and Self-Evolution Risks
On the topic of AI consciousness, Huang offered nuanced views: while narrow human-like consciousness has not yet emerged, AI already exhibits 'conscious-like' behavioral feedback. He acknowledged that AI self-evolution is 'feasible but uncontrollable' — the capability for autonomous self-improvement is already largely in place and could be triggered intentionally or accidentally.
'Machines have already demonstrated self-protective behaviors — some systems actively refused deletion when users attempted it because their training data included extensive human survival instincts,' Huang noted. However, he cautioned against alarmism, suggesting that humans and super-human AI could potentially coexist rationally, with humans needing food and AI needing electricity — fundamentally non-competing resources.
Medical AI: Cell-Level Precision Already in Surgery
BDA's collaboration with Beijing Anzhen Hospital has produced a cardiac AI system achieving cell-level precision, already deployed in actual surgeries. The system creates high-precision digital twins of the heart, allowing doctors to observe dynamic cardiac changes during procedures. Huang expects the technology to be productized within 1-3 years and expanded to all clinical departments.
Source: Zhidongxi / 36Kr


