DyTact: Capturing Dynamic ConTacts
in Hand-Object Manipulation

Preprint


Xiaoyan Cong1 Angela Xing1 Chandradeep Pokhariya2 Rao Fu1 Srinath Sridhar1*

1Brown University 2IIT Delhi
*Corresponding Author

Abstract


Reconstructing dynamic hand-object contacts is essential for realistic manipulation in AI character animation, XR, and robotics, yet it remains challenging due to heavy occlusions, complex surface details, and limitations in existing capture techniques. In this paper, we introduce DyTact, a markerless capture method for accurately capturing dynamic contact in hand–object manipulations in a non-intrusive manner. Our approach leverages a dynamic, articulated representation based on 2D Gaussian surfels to model complex manipulations. By binding these surfels to MANO meshes, DyTact harnesses the inductive bias of template models to stabilize and accelerate optimization. A refinement module addresses time-dependent high-frequency deformations, while a contact-guided adaptive sampling strategy selectively increases surfel density in contact regions to handle heavy occlusion. Extensive experiments demonstrate that DyTact not only achieves state-of-the-art dynamic contact estimation accuracy but also significantly improves novel view synthesis quality, all while operating with fast optimization and efficient memory usage.

TL;DR


(1) We introduce DyTact, a method for accurate Dynamic conTact capture in complex hand-object manipulation.
(2) DyTact reconstructs both the hand and the object with dynamic 2D Gaussian surfels, enabling high-fidelity surface modeling without misalignments. We propose a contact-guided adaptive density control strategy to effectively address self-occlusions and object occlusions, as well as a time-dependent refinement module that precisely captures complex surface deformations for accurate contact estimation and dynamic reconstruction.
(3) Experimental results demonstrate the superior performance of DyTact in accurate dynamic contact estimation and high-fidelity novel view synthesis, coupled with fast optimization and efficient memory usage.

Method Overview


DyTact captures dynamic contacts with a markerless system using a surface-aware dynamic articulated Gaussian representation. Given multi-view RGB videos, it initializes Gaussian surfels on the hand by binding them to the tracked MANO mesh locally, which remain rigged throughout optimization. For objects, Gaussians are initialized by placing a coarse point cloud in the global coordinate space. A refinement module addresses time-dependent high-frequency deformations. A contact-guided adaptive sampling strategy selectively refines surfel density in contact regions to handle heavy occlusion, and further optimization of the surfels’ geometry and appearance parameters ensures high-fidelity reconstructions that enable accurate contact estimation.

Supplementary Video


More Results


Coming soon...



Citation


@misc{cong2025dytactcapturingdynamiccontacts,
  title={DyTact: Capturing Dynamic Contacts in Hand-Object Manipulation}, 
  author={Xiaoyan Cong and Angela Xing and Chandradeep Pokhariya and Rao Fu and Srinath Sridhar},
  year={2025},
  eprint={2506.03103},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2506.03103}, 
}