The embodied AI field is undergoing a comprehensive transformation toward a vision-centric sensing paradigm. As the highest-density information modality for robot perception and the most natural fit for human-robot interaction, vision is the core key to unlocking general-purpose robot intelligence and achieving seamless sim-to-real transfer.
However, researchers pursuing this path have always faced difficult trade-offs between visual fidelity and training speed: high-fidelity visual rendering brings enormous computational and memory overhead; manual modeling is always time-consuming and inefficient; existing platform compatibility limitations constantly restrict innovation boundaries.
To overcome these core challenges constraining embodied AI development, Tsinghua AIR DISCOVER Lab, in collaboration with MouXianfei Technology, Yuanli Lingji, Qiuzhi Technology, and DiGu Robots, proposed GS-Playground - a universal multimodal simulation framework.
As a next-generation simulation infrastructure built specifically for vision-centric robot learning, GS-Playground achieves deep integration of high-throughput parallel physics simulation and high-fidelity visual rendering for the first time. While ensuring the high precision and strong stability required for physics simulation, it provides rendering efficiency and environmental support for large-scale visual-driven policy training and sim-to-real transfer.
**Universal Full-Scene Native Compatibility**
GS-Playground is designed as a universal full-scene embodied AI simulation platform. The platforms core incorporates a self-developed cross-platform parallel physics engine that natively supports CPU/GPU dual backends and Windows/Linux/macOS full system operation, enabling seamless adaptation to all robot types - including quadruped robots, full-size humanoid robots, and multi-DOF industrial robotic arms, all with out-of-the-box native adaptation.
**High-Performance Parallel Physics Engine**
For vision-centric robot learning, seeing is only the first step. What truly determines whether a policy can transfer to the real world is whether the simulation system can continuously provide stable, credible physical feedback during complex contacts, friction, collisions, and multi-body coupling.
GS-Playground addresses this core bottleneck by developing a high-performance parallel physics engine from the ground up. In the Franka Panda dynamic grasping test, GS-Playground CPU backend achieved 90/90 complete success rate under both 0.002s and 0.01s timesteps, significantly outperforming mainstream solutions like MuJoCo, IsaacSim, and Genesis.
**Memory-Efficient Batch 3DGS Rendering**
The memory and computing challenges posed by thousands of high-fidelity 3DGS scene renderings simultaneously have been the core bottleneck restricting large-scale robot training driven by vision.
On a single NVIDIA RTX 4090 GPU, the renderer achieves breakthrough throughput of up to 10,000 FPS at 640x480 resolution, capable of simultaneously rendering up to 2,048 scenes.
**Automated Real2Sim Workflow**
GS-Playground designed a fully automated image-to-physics Real2Sim workflow. With only a single RGB image as input, it can complete the full-process creation of Sim-Ready digital assets within minutes, achieving rapid conversion from real scenes to digital twins while ensuring visual realism and physical consistency.
This achievement has been accepted by RSS 2026 (Robotics: Science and Systems), a top international academic conference in the robotics field.