Haoru Wang Papers

ICLR Conference 2025 Conference Paper

Aligning Human Motion Generation with Human Perceptions

Haoru Wang
Wentao Zhu 0004
Luyi Miao
Yishu Xu
Feng Gao 0014
Qi Tian 0001
Yizhou Wang 0001

Human motion generation is a critical task with a wide spectrum of applications. Achieving high realism in generated motions requires naturalness, smoothness, and plausibility. However, current evaluation metrics often rely on simple heuristics or distribution distances and do not align well with human perceptions. In this work, we propose a data-driven approach to bridge this gap by introducing a large-scale human perceptual evaluation dataset, MotionPercept, and a human motion critic model, MotionCritic, that capture human perceptual preferences. Our critic model offers a more accurate metric for assessing motion quality and could be readily integrated into the motion generation pipeline to enhance generation quality. Extensive experiments demonstrate the effectiveness of our approach in both evaluating and improving the quality of generated human motions by aligning with human perceptions. Code and data are publicly available at https://motioncritic.github.io/.

Details

TMLR Journal 2025 Journal Article

Learning Deformable Body Interactions With Adaptive Spatial Tokenization

Hao Wang
Yu Liu
Daniel Biggs
Haoru Wang
Jiandong Yu
Ping Huang

Simulating interactions between deformable bodies is vital in fields like material science, mechanical design, and robotics. While learning-based methods with Graph Neural Networks (GNNs) are effective at solving complex physical systems, they encounter scalability issues when modeling deformable body interactions. To model interactions between objects, pairwise global edges have to be created dynamically, which is computationally intensive and impractical for large-scale meshes. To overcome these challenges, drawing on insights from geometric representations, we propose an Adaptive Spatial Tokenization (AST) method for efficient representation of physical states. By dividing the simulation space into a grid of cells and mapping unstructured meshes onto this structured grid, our approach naturally groups adjacent mesh nodes. We then apply a cross-attention module to map the sparse cells into a compact, fixed-length embedding, serving as tokens for the entire physical state. Self-attention modules are employed to predict the next state over these tokens in latent space. This framework leverages the efficiency of tokenization and the expressive power of attention mechanisms to achieve accurate and scalable simulation results. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches in modeling deformable body interactions. Notably, it remains effective on large-scale simulations with meshes exceeding 100,000 nodes, where existing methods are hindered by computational limitations. Additionally, we contribute a novel large-scale dataset encompassing a wide range of deformable body interactions to support future research in this area.

PDF Details

Possible papers

Aligning Human Motion Generation with Human Perceptions

Learning Deformable Body Interactions With Adaptive Spatial Tokenization