HMI Lab (ZGCA)A foundational program established at Zhongguancun Academy: 'Fully-Conditional Hand-Object Interaction World Model', built in collaboration with HMI Lab (PKU)
We focus on cognition, imagination, and skill within a spatial interaction intelligence framework, aiming to build a closed loop of human–machine–environment interaction and advance embodied agents toward unified knowing and acting. On the cognition level, we explore fundamental frameworks for world abstraction and compression to enable unified knowledge acquisition and evolution for spatial environments and human social interaction. On the imagination level, we study controllable paradigms of state transition and evolution to build knowledge-driven, fully conditional predictive models. On the skill level, we investigate the underlying mechanisms of embodied intent emergence and generate behaviors that are consistent in space and social context.

Research Directions
Five major directions for embodied interaction world models
Build a unified world model for navigation and manipulation, addressing weak long-horizon action generation and instruction understanding.
Build a unified action understanding and generation model to address instruction generalization and action-vision alignment.
Fuse vision, language, and spatial perception to tackle interaction-level 4D cognition and semantic understanding.
Predict geometry from multimodal inputs to enable fast reconstruction in dynamic scenes.
Use egocentric perception for autonomous navigation, solving localization and decision-making in complex environments.

