Geometry Reconstruction · Scene Reconstruction

This direction focuses on 4D dynamic feed-forward scene reconstruction. We aim to predict temporally consistent 3D geometry from multimodal, multi-view inputs (e.g., RGB images with camera intrinsics/extrinsics, depth, or point clouds). By jointly modeling spatial and temporal information with cross-view attention, the model outputs depth, geometry, and scene flow, enabling efficient reconstruction of highly dynamic scenes for downstream navigation or manipulation.