HMI Lab (ZGCA)A foundational program established at Zhongguancun Academy: 'Fully-Conditional Hand-Object Interaction World Model', built in collaboration with HMI Lab (PKU)

We focus on cognition, imagination, and skill within a spatial interaction intelligence framework, aiming to build a closed loop of human–machine–environment interaction and advance embodied agents toward unified knowing and acting. On the cognition level, we explore fundamental frameworks for world abstraction and compression to enable unified knowledge acquisition and evolution for spatial environments and human social interaction. On the imagination level, we study controllable paradigms of state transition and evolution to build knowledge-driven, fully conditional predictive models. On the skill level, we investigate the underlying mechanisms of embodied intent emergence and generate behaviors that are consistent in space and social context.

CognitionImaginationSkill

Research Directions

Five major directions for embodied interaction world models

Scene Generation

Build a unified world model for navigation and manipulation, addressing weak long-horizon action generation and instruction understanding.

查看详情 →

Action Generation

Build a unified action understanding and generation model to address instruction generalization and action-vision alignment.

查看详情 →

Multimodal Cognition

Fuse vision, language, and spatial perception to tackle interaction-level 4D cognition and semantic understanding.

查看详情 →

Geometry Reconstruction

Predict geometry from multimodal inputs to enable fast reconstruction in dynamic scenes.

查看详情 →

Embodied Navigation

Use egocentric perception for autonomous navigation, solving localization and decision-making in complex environments.

查看详情 →