Physics in human motion

There are 3 levels of physics knowledge integration into deep learning methods^[1]:

use the data generated from physics model. This is the least involved level, where the model equations are only used as a source to generate data for supervised training.
use soft constraints in the form of physics based loss term. This is where approaches like PINN belong.
use differentiable physics simulator. This is the tightest coupling currently, where the network weights are updated with the gradients calculated from an integrated simulator.

In human motion related problems, supervised training with real data dominates. Simulated data are not as popular because of the complexity and multimodality of human movement. However, we still can find some works that are close to this approach, such as PhysDiff^[5], which provides physics denoised data for the diffusion process by pushing the intermediate results through a physics simulator.

The second approach is not usually utilized the same way as in PINN, because the PDE describing human motion includes discrete contact terms that are difficult to get gradients from. However, intuitive physics loss terms such as simple stability and ground constraints are getting more attention, for example IPMAN^[2]. Another slightly related example is InterDiff^[4], where the trajectories of the interacting objects are kept simple by transformation to appropriate frames.

One work that uses the final approach is Weakly supervised learning of human dynamics^[3]. The paper did not incorporate a physics simulator, instead it used simple forward dynamics calculations supplied with prediction of contact information (which is also the approach of DnD^[6]). It would be interesting to check if SIP training would improve its performance. Probably likewise for DnD, however the training time could be prohibitive. Another paper much closer to the spirit of the final approach is DiffPhy^[7], where TDS is used in the reconstruction loop.

_[1] _{Thuerey, Nils et al. “Physics-based Deep Learning.” ArXiv abs/2109.05237 (2021): n. pag.}

_[2] _{Tripathi, Shashank et al. “3D Human Pose Estimation via Intuitive Physics.” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023): 4713-4725.}

_[3]_{Zell, Petrissa et al. “Weakly-supervised Learning of Human Dynamics.” European Conference on Computer Vision (2020).}

_[4]_{Xu, Sirui et al. “InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion.” 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2023): 14882-14894.}

_[5]_{Yuan, Ye et al. “PhysDiff: Physics-Guided Human Motion Diffusion Model.” 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2022): 15964-15975.}

_[6]_{Li, Jiefeng et al. “D&D: Learning Human Dynamics from Dynamic Camera.” European Conference on Computer Vision (2022).}

_[7]_{Gartner, Erik et al. “Differentiable Dynamics for Articulated 3d Human Motion Reconstruction.” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022): 13180-13190.}

Enjoy Reading This Article?