EG
Ambient Diffusion Policy method overview showing how suboptimal data is restricted to low and high noise levels during diffusion policy training
ResearchJune 15, 2026Embodied Global Team

Ambient Diffusion Policy: MIT's Principled Method for Imitation Learning from Suboptimal Robot Data

MIT researchers propose Ambient Diffusion Policy, a principled method extracting useful features from suboptimal robot data by using noise-dependent data usage. It outperforms existing baselines by up to 33% on Open X-Embodiment and requires only a single change to the dataloader.

#MIT#Diffusion Policy#Imitation Learning#Suboptimal Data#Open X-Embodiment#Robotics
Reading in English

A team of researchers from MIT CSAIL has introduced Ambient Diffusion Policy, a simple yet principled method for imitation learning from suboptimal robot data. The approach addresses one of the most pressing challenges in robotics: how to make effective use of abundant, lower-quality demonstration data alongside scarce, high-quality expert demonstrations.

High-quality, task-specific robot data is expensive and time-consuming to collect, while suboptimal datasets — including noisy trajectories, simulation data, cross-embodied demonstrations, and large-scale heterogeneous collections like Open X-Embodiment — are widely available but difficult to use effectively.

The Key Insight: Noise-Dependent Data Usage

The researchers observed that Diffusion Policy learns different features of robot action data at different noise levels, driven by a spectral power law in action data. At high noise levels, the optimal denoiser learns global task-level structure; at low noise levels, it refines local motion-level primitives.

Ambient Diffusion Policy leverages this property by restricting suboptimal data to contribute at only high and low diffusion times — where distributional differences from high-quality data are either masked by noise (at high noise) or limited to local motion features (at low noise). The implementation is remarkably simple: it requires only a single change to the Diffusion Policy dataloader.

Experimental Results

The method is validated across four types of suboptimal action data — noisy trajectories, sim-to-real gap, task mismatch, and large-scale data mixtures — across six robotic tasks. Results show it effectively learns from arbitrary sources of suboptimal data.

Notably, when scaled to Open X-Embodiment — a massive dataset with heterogeneous data quality and unstructured distribution shifts — Ambient Diffusion Policy outperforms existing co-training baselines by up to 33%, demonstrating its practical value for real-world robotic learning.

The work, involving researchers including Russ Tedrake and Constantinos Daskalakis, expands the set of usable data sources in robotics and reduces the dependency on expensively collected expert demonstrations, potentially accelerating the development of general-purpose robotic policies.

Language: English- Showing content in English