Aram Davtyan

I am a Postdoctoral Researcher in the Computer Vision Group at the University of Bern. I earned my Ph.D. in Computer Science from the University of Bern in 2024, where I was supervised by Prof. Dr. Paolo Favaro. Prior to that, I completed a Specialist degree (equivalent to B.S. + M.S.) in Fundamental Mathematics and Mechanics at MSU in 2020. Additionally, I graduated from YSDA in 2018. My research interests include Machine Learning, Computer Vision, Generative AI, and World Models.


Publications

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control

Mariam Hassan, Sebastian Stapf, Ahmad Rahimi, Pedro M B Rezende, Yasaman Haghighi, David Brüggemann, Isinsu Katircioglu, Lin Zhang, Xiaoran Chen, Suman Saha, Marco Cannici, Elie Aljalbout, Botao Ye, Xi Wang, Aram Davtyan, Mathieu Salzmann, Davide Scaramuzza, Marc Pollefeys, Paolo Favaro, Alexandre Alahi
arXiv, 2024

A multi-modal and multi-domain ego-vision world model with precise control over object dynamics, ego-agent motion and human poses.

CAGE: Unsupervised Visual Composition and Animation for Controllable Video Generation

CAGE: Unsupervised Visual Composition and Animation for Controllable Video Generation

AAAI, 2025

A model to compose and animate scenes from sparse sets of visual features.

Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object Video Generation
Multi-View Unsupervised Image Generation with Cross Attention Guidance

Multi-View Unsupervised Image Generation with Cross Attention Guidance

arXiv, 2023

Single image to novel view synthesis without any supervision.

Efficient Video Prediction via Sparsely Conditioned Flow Matching

Efficient Video Prediction via Sparsely Conditioned Flow Matching

Aram Davtyan, Sepehr Sameni, Paolo Favaro
ICCV, 2023

Conditioning only on a few randomly chosen past frames at each denoising step of flow matching results into a more efficient training procedure.

Controllable Video Generation through Global and Local Motion Dynamics

Controllable Video Generation through Global and Local Motion Dynamics

Aram Davtyan, Paolo Favaro
ECCV, 2022

A model to discover agents' action spaces from a dataset of videos in an unsupervised way. The action spaces are decomposed into global (2D shifts) and local (discrete) actions.