EG
Age of robots concept illustration with KAIST researchers
ResearchJune 12, 2026Embodied Global Team

KAIST's VOTP Method Enables Robots to Learn Human Judgment from Just 10 Videos

KAIST's VOTP method achieves ICML 2026 Oral acceptance by enabling robots to learn human judgment from just 10 videos using optimal transport mathematics, dramatically reducing annotation costs.

#kaist#votp#icml-2026#preference-learning#reinforcement-learning#embodied-ai
Reading in English

Researchers from KAIST have developed VOTP (Video-based Optimal Transport Preference labeling), a method that enables robots to learn human judgment criteria from just 10 labeled videos. The research was accepted to ICML 2026 and selected for an Oral presentation—a distinction given to only 168 papers out of 23,918 submissions (0.7%). Current preference-based reinforcement learning requires hundreds or thousands of human comparisons to train reward functions. VOTP addresses this bottleneck by using optimal transport mathematics to infer preferences for unlabeled video pairs. In experiments, VOTP with only 10 labels outperformed policies trained with ground-truth rewards. On real tabletop robot tasks using a Rethink Sawyer arm, VOTP achieved 80% success rate on LiftBanana and 70% on DrawerOpen with just 5-10 preference labels. Professor Chang D. Yoo stated: 'Since VOTP can learn human judgment criteria with only a small number of videos, it is a core technology that will accelerate the era of robots making human-like judgments.'

Language: English- Showing content in English