A team of researchers from MIT CSAIL has introduced Ambient Diffusion Policy, a simple yet principled method for imitation learning from suboptimal robot data. The approach addresses one of the most pressing challenges in robotics: how to make effective use of abundant, lower-quality demonstration data alongside scarce, high-quality expert demonstrations.
High-quality, task-specific robot data is expensive and time-consuming to collect, while suboptimal datasets — including noisy trajectories, simulation data, cross-embodied demonstrations, and large-scale heterogeneous collections like Open X-Embodiment — are widely available but difficult to use effectively.
The Key Insight: Noise-Dependent Data Usage
The researchers observed that Diffusion Policy learns different features of robot action data at different noise levels, driven by a spectral power law in action data. At high noise levels, the optimal denoiser learns global task-level structure; at low noise levels, it refines local motion-level primitives.
Ambient Diffusion Policy leverages this property by restricting suboptimal data to contribute at only high and low diffusion times — where distributional differences from high-quality data are either masked by noise (at high noise) or limited to local motion features (at low noise). The implementation is remarkably simple: it requires only a single change to the Diffusion Policy dataloader.
Experimental Results
The method is validated across four types of suboptimal action data — noisy trajectories, sim-to-real gap, task mismatch, and large-scale data mixtures — across six robotic tasks. Results show it effectively learns from arbitrary sources of suboptimal data.
Notably, when scaled to Open X-Embodiment — a massive dataset with heterogeneous data quality and unstructured distribution shifts — Ambient Diffusion Policy outperforms existing co-training baselines by up to 33%, demonstrating its practical value for real-world robotic learning.
The work, involving researchers including Russ Tedrake and Constantinos Daskalakis, expands the set of usable data sources in robotics and reduces the dependency on expensively collected expert demonstrations, potentially accelerating the development of general-purpose robotic policies.



