Xiaoyan Cong

Ms. Forcing: Efficient Streaming Video Generation with Multi-Scale Patchification and Attention

Zekun Li*, Xiaoyan Cong*, Hongyu Li, Zhiyang Dou, Chuan Guo, Abhay Mittal, Sizhe An, Srinath Sridhar

TL;DR We propose a real-time streaming video generation framework with multi-scale patchification and attention, achieving high-quality generation at 22.8 FPS on a single H200 GPU.

Preprint arXiv Project

VIVA: VLM-Guided Instruction-Based Video Editing with Reward Optimization

Xiaoyan Cong, Haotian Yang, Angtian Wang, Yizhi Wang, Yiding Yang, Canyu Zhang, Chongyang Ma

TL;DR We propose a scalable instruction-based video editing model that first introduces RL post-training to the video editing domain.

CVPR 2026 arXiv Project

UMO: Unified In-Context Learning Unlocks Motion Foundation Model Priors

Xiaoyan Cong*, Zekun Li*, Zhiyang Dou, Hongyu Li, Omid Taheri, Chuan Guo, Abhay Mittal, Sizhe An, Taku Komura, Wojciech Matusik, Michael J. Black, Srinath Sridhar

TL;DR We propose a unified formulation that unlocks the generative priors of a text-to-motion foundation model for diverse downstream tasks.

ECCV 2026 arXiv Project

Deform360: A Massive Multi-view Visuotactile Dataset for Deformable World Models

Hongyu Li, Wanjia Fu, Xiaoyan Cong, Zekun Li, Binghao Huang, Hanxiao Jiang, Xintong He, Yiqing Liang, Rao Fu, Tao Lu, Srinath Sridhar, Kevin A. Smith, George Konidaris, Yunzhu Li

ECCV 2026 Google Drive

VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

Hongyang Du*, Junjie Ye*, Xiaoyan Cong*, Runhao Li, Jingcheng Ni, Aman Agarwal, Zeqi Zhou, Zekun Li, Randall Balestriero, Yue Wang

TL;DR We propose an RL post-training framework that aligns video diffusion models toward 3D geometric consistency via DPO.

ICML 2026 arXiv Project

PackUV: Packed Gaussian UV Maps for 4D Volumetric Video

Aashish Rai, Angela Xing, Anushka Agarwal, Xiaoyan Cong, Zekun Li, Tao Lu, Srinath Sridhar

TL;DR We propose a volumetric video representation that maps 4D Gaussians onto UV atlases for codec-compatible, streamable playback, along with the largest multi-view 4D dataset.

CVPR 2026 arXiv Project

DyTact: Capturing Dynamic Contacts in Hand-Object Manipulation with Deformable Gaussians

Xiaoyan Cong, Angela Xing, Chandradeep Pokhariya, Rao Fu, Srinath Sridhar

TL;DR Markerless capture of dynamic hand-object contacts with deformable Gaussians.

3DV 2026 Oral arXiv Project

Art3D: Training-Free 3D Generation from Flat-Colored Illustration

Xiaoyan Cong, Jiayi Shen, Zekun Li, Rao Fu, Tao Lu, Srinath Sridhar

TL;DR We propose a training-free solution for reconstructing geometry from flat-colored images.

CVPRW 2025 Oral arXiv Project Course Project of CSCI1430

GenHSI: Controllable Generation of Human-Scene Interaction Videos

Zekun Li, Rui Zhou, Rahul Sajnani, Xiaoyan Cong, Daniel Ritchie, Srinath Sridhar

TL;DR We propose chain-of-frame prompting for VDMs to generate plausible HSI videos without training.

WACV 2026 arXiv Project

Oscillation Inversion: Training-free Image and Video Enhancement through Oscillated Latents in Large Flow Models

Yan Zheng, Zhenxiao Liang, Xiaoyan Cong, Yi Yang, Lanqing Guo, Yuehao Wang, Peihao Wang, Zhangyang Wang

TL;DR Exploits oscillatory patterns in the inversion of flow models for training-free image and video enhancement.

AAAI 2026 Oral arXiv

4DRecons: 4D Neural Implicit Deformable Objects Reconstruction from a single RGB-D Camera with Geometrical and Topological Regularizations

Xiaoyan Cong, Haitao Yang, Liyan Chen, Kaifeng Zhang, Li Yi, Chandrajit L. Bajaj, Qixing Huang

TL;DR Reconstruct deformable 3D objects from a single RGB-D video with novel geometrical and topological regularizations.

CVM 2026 arXiv

Automatic Controllable Colorization by Imagination

Xiaoyan Cong, Yue Wu, Qifeng Chen, Chenyang Lei

TL;DR Automatic colorization via imagined reference images from generative models, enabling iterative editing.

CVPR 2024 arXiv Project

Xiaoyan Cong 丛箫言

Research Interests

Publications