Xiaoyan Cong
Gansu Province, China
28th June 2024

Xiaoyan Cong 丛箫言

Email: xiaoyan_cong [at] brown [dot] edu

Office: 115 Waterman St, Providence, RI 02906

I am currently a 2nd-year CS Ph.D. student at Brown University, advised by Prof. Srinath Sridhar. I obtained my B.E. of Robotics Engineering from Zhejiang University with honors from Cho Kochen Honors College in 2024. During my undergraduate, I am grateful to be advised by Prof. Qixing Huang, Prof. Li Yi, Prof. Qifeng Chen, Prof. Xiaowei Zhou.

I'm always open to collaboration -- please contact me without hesitation!

I am currently looking for a research internship for 2026. Please reach out if you find I could be a good fit for your team!

Research Interests

My research centers on multimodal foundation models for content creation across images, video, motion, and language. I am also broadly interested in world models and 3D/4D representations.

Publications

(*: indicates equal contribution.)
UMO

UMO: Unified In-Context Learning Unlocks Motion Foundation Model Priors

Xiaoyan Cong*, Zekun Li*, Zhiyang Dou, Hongyu Li, Omid Taheri, Chuan Guo, Abhay Mittal, Sizhe An, Taku Komura, Wojciech Matusik, Michael J. Black, Srinath Sridhar

TL;DR We propose a unified formulation that unlocks the generative priors of a text-to-motion foundation model for diverse downstream tasks.

VIVA

VIVA: VLM-Guided Instruction-Based Video Editing with Reward Optimization

Xiaoyan Cong, Haotian Yang, Angtian Wang, Yizhi Wang, Yiding Yang, Canyu Zhang, Chongyang Ma

TL;DR We propose a scalable instruction-based video editing model that first introduces RL post-training to the video editing domain.

VideoGPA

VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

Hongyang Du*, Junjie Ye*, Xiaoyan Cong*, Runhao Li, Jingcheng Ni, Aman Agarwal, Zeqi Zhou, Zekun Li, Randall Balestriero, Yue Wang

TL;DR We propose an RL post-training framework that aligns video diffusion models toward 3D geometric consistency via DPO.

PackUV

PackUV: Packed Gaussian UV Maps for 4D Volumetric Video

Aashish Rai, Angela Xing, Anushka Agarwal, Xiaoyan Cong, Zekun Li, Tao Lu, Srinath Sridhar

TL;DR We propose a volumetric video representation that maps 4D Gaussians onto UV atlases for codec-compatible, streamable playback, along with the largest multi-view 4D dataset.

DyTact

DyTact: Capturing Dynamic Contacts in Hand-Object Manipulation with Deformable Gaussians

Xiaoyan Cong, Angela Xing, Chandradeep Pokhariya, Rao Fu, Srinath Sridhar

TL;DR Markerless capture of dynamic hand-object contacts with deformable Gaussians.

Art3D

Art3D: Training-Free 3D Generation from Flat-Colored Illustration

Xiaoyan Cong, Jiayi Shen, Zekun Li, Rao Fu, Tao Lu, Srinath Sridhar

TL;DR We propose a training-free solution for reconstructing geometry from flat-colored images.

GenHSI

GenHSI: Controllable Generation of Human-Scene Interaction Videos

Zekun Li, Rui Zhou, Rahul Sajnani, Xiaoyan Cong, Daniel Ritchie, Srinath Sridhar

TL;DR We propose chain-of-frame prompting for VDMs to generate plausible HSI videos without training.

Oscillation Inversion

Oscillation Inversion: Training-free Image and Video Enhancement through Oscillated Latents in Large Flow Models

Yan Zheng, Zhenxiao Liang, Xiaoyan Cong, Yi Yang, Lanqing Guo, Yuehao Wang, Peihao Wang, Zhangyang Wang

TL;DR Exploits oscillatory patterns in the inversion of flow models for training-free image and video enhancement.

4DRecons

4DRecons: 4D Neural Implicit Deformable Objects Reconstruction from a single RGB-D Camera with Geometrical and Topological Regularizations

Xiaoyan Cong, Haitao Yang, Liyan Chen, Kaifeng Zhang, Li Yi, Chandrajit L. Bajaj, Qixing Huang

TL;DR Reconstruct deformable 3D objects from a single RGB-D video with novel geometrical and topological regularizations.

Colorization

Automatic Controllable Colorization by Imagination

Xiaoyan Cong, Yue Wu, Qifeng Chen, Chenyang Lei

TL;DR Automatic colorization via imagined reference images from generative models, enabling iterative editing.

Experience

Research Scientist Intern
ByteDance Inc.
Jun 2025 - Jan 2026 San Jose, CA
Advisor: Chongyang Ma