Researchers at KAIST's School of Electrical Engineering, led by Professor Yoo Chang-dong, have developed VOTP (Video Optimal TransPort), a breakthrough physical AI learning technology that extracts human-intended evaluation criteria from a limited number of video examples. The research has been selected as a major presentation paper for ICML 2026 (International Conference on Machine Learning).
VOTP addresses a fundamental bottleneck in Physical AI: the design of precise reward functions that determine the success or safety of physical actions. Traditionally, this requires extensive manual engineering or constant human feedback. VOTP automates the process by analyzing high-level behavioral patterns in both 'good' and 'bad' example videos, then deriving criteria that can be applied to novel, unscored actions.
The framework works across diverse hardware interfaces including humanoid robots, surgical robotic arms, and autonomous navigation systems in smart factories. By learning human intentions from minimal visual data, VOTP significantly reduces the training overhead for complex robotic tasks, enabling machines to align their movements with human preferences without exhaustive supervised learning datasets.
This advancement comes as the AI industry shifts from generative models toward physical agents that interact with the real world, where efficient, intention-aligned machine learning serves as a core mechanism for deploying autonomous systems in unpredictable environments.



