GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
Mariam Hassan,
Sebastian Stapf,
Ahmad Rahimi,
Pedro M B Rezende,
Yasaman Haghighi,
David Brüggemann,
Isinsu Katircioglu,
Lin Zhang,
Xiaoran Chen,
Suman Saha,
Marco Cannici,
Elie Aljalbout,
Botao Ye,
Xi Wang,
Aram Davtyan,
Mathieu Salzmann,
Davide Scaramuzza,
Marc Pollefeys,
Paolo Favaro,
Alexandre Alahi
arXiv, 2024
A multi-modal and multi-domain ego-vision world model with precise control over object dynamics, ego-agent motion and human poses.