EG
An AI robot arm performing precise object manipulation in a research laboratory setting with glowing circuit patterns
ResearchJune 21, 2026Embodied Global Team

MemoryWAM: New World Action Model with Persistent Memory Achieves Breakthrough in Long-Horizon Robot Manipulation

Researchers introduce MemoryWAM, a world action model with hybrid memory structure combining recent frames, event boundary anchors, and compact gist tokens. It enables long-horizon robot manipulation with significantly lower inference latency and GPU memory, outperforming VLA and WAM baselines in both simulation and real-world tests.

#MemoryWAM#world action model#robot manipulation#persistent memory#arXiv#research#embodied AI
Reading in English

A new research paper introduces MemoryWAM, an efficient world action model with persistent memory specifically designed for long-horizon robot manipulation tasks. Published on arXiv in June 2026, the work addresses a fundamental trade-off in world action models (WAMs): efficient methods typically condition on limited recent observations and struggle in non-Markovian environments, while methods that retain long-term history suffer from prohibitive time and space costs.

MemoryWAM employs a hybrid memory structure that integrates three types of information: recent frames for fine-grained short-term context, event boundary anchor frames capturing key transition moments, and compact 'gist tokens' that summarize long-range historical information. A custom attention mechanism simultaneously retrieves detailed short-term context and highly compressed long-term context, significantly reducing inference latency and GPU memory usage while supporting memory-dependent decision-making.

The model was evaluated across a series of long-horizon, memory-dependent manipulation tasks in both simulation and real-world environments. Results show MemoryWAM significantly outperforms strong Vision-Language-Action (VLA) models and various WAM baselines, while maintaining excellent computational efficiency.

This research represents a meaningful step toward enabling robots to operate effectively in complex, real-world environments that require sustained attention and memory across extended task sequences. The hybrid memory approach offers a practical solution to the scaling problem of context length in embodied AI systems, which has been a major bottleneck for deploying foundation models in physical robotics.

Language: English- Showing content in English

Share this article