Aram Davtyan

I am a Postdoctoral Researcher in the Computer Vision Group at the University of Bern, working on generative AI, controllable video generation, and world models. I am especially interested in models that learn reusable world representations from the visual experience, going beyond plausible synthesis to support couterfactual intervention, generalization, visual intelligence, adaptation to new tasks, and beyond. I earned my Ph.D. in Computer Science from the University of Bern in 2024, advised by Prof. Dr. Paolo Favaro. Before that, I completed a Specialist degree in Fundamental Mathematics and Mechanics at Lomonosov Moscow State University and studied data analysis at YSDA.

Download CV Scholar GitHub LinkedIn

News

Recent Updates

Jun 2026

World Model Self-Distillation is now available on arXiv.

Mar 2026

COMiT introduces a compact, structured image representation built from sequential crop aggregation.

Dec 2025

Received an ARC Prize Honorable Mention for research on repurposing video diffusion models for novel visual and logic tasks.

Sep 2025

I gave a talk during the Swiss AI Weeks introducing GEM and other works built using the Swiss AI compute.

Jan 2025

Started as a Postdoc in the Computer Vision Group at the University of Bern.

Dec 2024

Successfully defended my PhD at the University of Bern! (Thesis)

Publications

arXiv, 2026

World Model Self-Distillation: Training World Models to Solve General Tasks

Sebastian Stapf, Pablo Acuaviva, Aram Davtyan, Paolo Favaro

A new way to train task-solving video world models without paired demonstrations.

Project PDF arXiv Code

ICML, 2026

Rethinking Visual Intelligence: Insights from Video Pretraining

Pablo Acuaviva, Aram Davtyan, Mariam Hassan, Sebastian Stapf, Ahmad Rahimi, Alexandre Alahi, Paolo Favaro

The second version of the Gen2Gen paper, in which we demonstrate that VDMs are more data efficient than LLMs in learning new visual tasks.

ARC Prize 2025 Honorable Mention.

PDF arXiv

ICLR, 2026

Composition of Memory Experts for Diffusion World Models

Sebastian Stapf, Pablo Acuaviva, Aram Davtyan, Paolo Favaro

A diffusion-based world modeling framework that integrates heterogeneous memory models through a contrastive product-of-experts formulation.

Project

arXiv, 2026

Communication-Inspired Tokenization for Structured Image Representations

Aram Davtyan, Yusuf Sahin, Yasaman Haghighi, Sebastian Stapf, Pablo Acuaviva, Alexandre Alahi, Paolo Favaro

A new way to represent images as discrete sequences of tokens by sequentially integrating information from image crops, yielding semantically meaningful structured representations.

Project PDF arXiv Code

IWSDS, 2026

Learning Vision-Language Alignment in Unified LLMs with 24 Text Tokens per Image

Nicola Irmiger, Yixuan Xu, Raphael Kreft, Aram Davtyan, Manuel Kaufmann, Imanol Schlag

Unpaired image adaptation of a pre-trained language model followed by lightweight image–text alignment enables multimodal understanding while preserving language capabilities.

Project

arXiv, 2025

From Generation to Generalization: Emergent Few-Shot Learning in Video Diffusion Models

Pablo Acuaviva, Aram Davtyan, Mariam Hassan, Sebastian Stapf, Ahmad Rahimi, Alexandre Alahi, Paolo Favaro

A few-shot fine-tuning framework that repurposes VDMs for new tasks using only a handful of examples.

Project PDF arXiv Code

NeurIPS, 2025

KOALA++: Efficient Kalman-Based Optimization of Neural Networks with Gradient-Covariance Products

Zixuan Xia, Aram Davtyan, Paolo Favaro

An extension of KOALA, a neural network optimization algorithm based on Kalman filtering, with implicit full weights covariance matrix.

PDF arXiv

ICCVW, 2025

MIRAGE: Unsupervised Single Image to Novel View Generation with Cross Attention Guidance

Llukman Cerkezi, Aram Davtyan, Sepehr Sameni, Paolo Favaro

Single image to novel view synthesis without any supervision.

PDF arXiv

ICLR, 2025

Faster Inference of Flow-Based Generative Models via Improved Data-Noise Coupling

Aram Davtyan, Leello Tadesse Dadi, Volkan Cevher, Paolo Favaro

A method that straightens sampling trajectories in the flow matching framework via storing and exchanging locally optimal data-noise couplings across minibatches.

Project Code

CVPR, 2025

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control

Mariam Hassan, Sebastian Stapf, Ahmad Rahimi, Pedro M B Rezende, Yasaman Haghighi, David Brüggemann, Isinsu Katircioglu, Lin Zhang, Xiaoran Chen, Suman Saha, Marco Cannici, Elie Aljalbout, Botao Ye, Xi Wang, Aram Davtyan, Mathieu Salzmann, Davide Scaramuzza, Marc Pollefeys, Paolo Favaro, Alexandre Alahi

A multi-modal and multi-domain ego-vision world model with precise control over object dynamics, ego-agent motion and human poses.

Project PDF arXiv Code

AAAI, 2025

CAGE: Unsupervised Visual Composition and Animation for Controllable Video Generation

Aram Davtyan, Sepehr Sameni, Björn Ommer, Paolo Favaro

A model to compose and animate scenes from sparse sets of visual features.

Project PDF arXiv Code

AAAI, 2024

Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object Video Generation

Aram Davtyan, Paolo Favaro

A model to animate single frames with sparse motion control.

Project PDF arXiv Code

ICCV, 2023

Efficient Video Prediction via Sparsely Conditioned Flow Matching

Aram Davtyan, Sepehr Sameni, Paolo Favaro

Conditioning only on a few randomly chosen past frames at each denoising step of flow matching results into a more efficient training procedure.

Project PDF arXiv Code

ECCV, 2022

Controllable Video Generation through Global and Local Motion Dynamics

Aram Davtyan, Paolo Favaro

A model to discover agents' action spaces from a dataset of videos in an unsupervised way. The action spaces are decomposed into global (2D shifts) and local (discrete) actions.

Project PDF arXiv Code

AAAI, 2022

KOALA: A Kalman Optimization Algorithm with Loss Adaptivity

Aram Davtyan, Sepehr Sameni, Llukman Cerkezi, Givi Meishvili, Adam Bielski, Paolo Favaro

A neural network optimization algorithm based on Kalman filtering.

Project PDF arXiv Code

Talks

Invited Talks and Presentations

Sep 2025

Swiss AI Initiative: Our Experience at Computer Vision Group @ UniBE

AI for SMEs: What can small businesses really do with artificial intelligence?, Swiss AI Weeks, Bern

Sep 2024

Unsupervised Controllable Video Generation

Invited Seminar, Computer Vision and Geometry Group, ETH Zurich, Zurich

Sep 2023

Efficient Video Prediction via Sparsely Conditioned Flow Matching

Nectar Track Oral Presentation, GCPR 2023, Heidelberg

Teaching

Courses

Spring 2026

Lecturer

Seminar Machine Learning and Artificial Intelligence | University of Bern

Fall 2025

Lecturer

Foundations of Deep Learning | University of Bern

2023-2025

Teaching Assistant

Deep Learning | University of Bern

2020-2024

Teaching Assistant

Machine Learning | University of Bern

Fall 2021

Teaching Assistant

Seminar Self-Supervised Learning in Computer Vision | YSDA

Spring 2021

Teaching Assistant

Advanced Topics in Machine Learning | University of Bern

Awards

Dec 2025 ARC Prize Honorable Mention

Recognized for research showing that video diffusion models can be repurposed to solve novel visual and logic tasks from only a handful of examples.

Service

Reviewer

ICLR, ICML, CVPR, NeurIPS, ICCV, ECCV