Xiaoyan Cong
Gansu Province, China
28th June 2024

Xiaoyan Cong 丛箫言

Email: xiaoyan_cong [at] brown [dot] edu

Office: 115 Waterman St, Providence, RI 02906

I am currently a 2nd-year CS Ph.D. student at Brown University, advised by Prof. Srinath Sridhar. I obtained my B.E. of Robotics Engineering from Zhejiang University with honors from Cho Kochen Honors College in 2024. During my undergraduate, I am grateful to be advised by Prof. Qixing Huang, Prof. Li Yi, Prof. Qifeng Chen, Prof. Xiaowei Zhou.

I'm always open to collaboration -- please contact me without hesitation!

I am currently looking for a research internship for 2026. Please reach out if you find I could be a good fit for your team!

Research Interests

My research centers on multimodal foundation models for content creation across images, video, motion, and language. I am also broadly interested in world models and 3D/4D representations.

Publications

(*: indicates equal contribution.)
VideoGPA

VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

Hongyang Du*, Junjie Ye*, Xiaoyan Cong*, Runhao Li, Jingcheng Ni, Aman Agarwal, Zeqi Zhou, Zekun Li, Randall Balestriero, Yue Wang

TL;DR We propose an RL post-training framework that aligns video diffusion models toward 3D geometric consistency via DPO.

UMO

UMO: Unified In-Context Learning Unlocks Motion Foundation Model Priors

Xiaoyan Cong*, Zekun Li*, Zhiyang Dou, Hongyu Li, Omid Taheri, Chuan Guo, Abhay Mittal, Sizhe An, Taku Komura, Wojciech Matusik, Michael J. Black, Srinath Sridhar

TL;DR We propose a unified formulation that unlocks the generative priors of a text-to-motion foundation model for diverse downstream tasks.

Deform360

Deform360: A Massive Multi-view Visuotactile Dataset for Deformable World Models

Hongyu Li, Wanjia Fu, Xiaoyan Cong, Zekun Li, Binghao Huang, Hanxiao Jiang, Xintong He, Yiqing Liang, Rao Fu, Tao Lu, Srinath Sridhar, Kevin A. Smith, George Konidaris, Yunzhu Li

VIVA

VIVA: VLM-Guided Instruction-Based Video Editing with Reward Optimization

Xiaoyan Cong, Haotian Yang, Angtian Wang, Yizhi Wang, Yiding Yang, Canyu Zhang, Chongyang Ma

TL;DR We propose a scalable instruction-based video editing model that first introduces RL post-training to the video editing domain.

PackUV

PackUV: Packed Gaussian UV Maps for 4D Volumetric Video

Aashish Rai, Angela Xing, Anushka Agarwal, Xiaoyan Cong, Zekun Li, Tao Lu, Srinath Sridhar

TL;DR We propose a volumetric video representation that maps 4D Gaussians onto UV atlases for codec-compatible, streamable playback, along with the largest multi-view 4D dataset.

DyTact

DyTact: Capturing Dynamic Contacts in Hand-Object Manipulation with Deformable Gaussians

Xiaoyan Cong, Angela Xing, Chandradeep Pokhariya, Rao Fu, Srinath Sridhar

TL;DR Markerless capture of dynamic hand-object contacts with deformable Gaussians.

Art3D

Art3D: Training-Free 3D Generation from Flat-Colored Illustration

Xiaoyan Cong, Jiayi Shen, Zekun Li, Rao Fu, Tao Lu, Srinath Sridhar

TL;DR We propose a training-free solution for reconstructing geometry from flat-colored images.

GenHSI

GenHSI: Controllable Generation of Human-Scene Interaction Videos

Zekun Li, Rui Zhou, Rahul Sajnani, Xiaoyan Cong, Daniel Ritchie, Srinath Sridhar

TL;DR We propose chain-of-frame prompting for VDMs to generate plausible HSI videos without training.

Oscillation Inversion

Oscillation Inversion: Training-free Image and Video Enhancement through Oscillated Latents in Large Flow Models

Yan Zheng, Zhenxiao Liang, Xiaoyan Cong, Yi Yang, Lanqing Guo, Yuehao Wang, Peihao Wang, Zhangyang Wang

TL;DR Exploits oscillatory patterns in the inversion of flow models for training-free image and video enhancement.

4DRecons

4DRecons: 4D Neural Implicit Deformable Objects Reconstruction from a single RGB-D Camera with Geometrical and Topological Regularizations

Xiaoyan Cong, Haitao Yang, Liyan Chen, Kaifeng Zhang, Li Yi, Chandrajit L. Bajaj, Qixing Huang

TL;DR Reconstruct deformable 3D objects from a single RGB-D video with novel geometrical and topological regularizations.

Colorization

Automatic Controllable Colorization by Imagination

Xiaoyan Cong, Yue Wu, Qifeng Chen, Chenyang Lei

TL;DR Automatic colorization via imagined reference images from generative models, enabling iterative editing.

Experience

Research Scientist Intern
ByteDance Inc.
Jun 2025 - Jan 2026 San Jose, CA
Advisor: Chongyang Ma