Ground Reaction Inertial Poser (GRIP)
Physics-based Human Motion Capture
from Sparse IMUs and Insole Pressure Sensors

Ryosuke Hori1,2,3Jyun-Ting Song1Zhengyi Luo1Jinkun Cao1Soyong Shin1Hideo Saito2,3Kris Kitani1

1Carnegie Mellon University  2Keio University  3Keio AI Research Center

CVPR 2026
GRIP Overview

Overview of the proposed Ground Reaction Inertial Poser (GRIP). (a) GRIP observes motion using four IMUs and foot pressure data from smartwatches and smart insoles. (b) Full-body motion is reconstructed by driving a humanoid with joint torques in a physics simulator. (c) The PRISM dataset offers multimodal measurements, including IMUs, foot pressure, motion data, and environmental data.

Video

Abstract

We propose Ground Reaction Inertial Poser (GRIP), a method that reconstructs physically plausible human motion using four wearable devices. Unlike conventional IMU-only approaches, GRIP combines IMU signals with foot pressure data to capture both body dynamics and ground interactions. Furthermore, rather than relying solely on kinematic estimation, GRIP uses a digital twin of a person, in the form of a synthetic humanoid in a physics simulator, to reconstruct realistic and physically plausible motion. At its core, GRIP consists of two modules: KinematicsNet, which estimates body poses and velocities from sensor data, and DynamicsNet, which controls the humanoid in the simulator using the residual between the KinematicsNet prediction and the simulated humanoid state. To enable robust training and fair evaluation, we introduce a large-scale dataset, Pressure and Inertial Sensing for Human Motion and Interaction (PRISM), that captures diverse human motions with synchronized IMUs and insole pressure sensors. Experimental results show that GRIP outperforms existing IMU-only and IMU–pressure fusion methods across all evaluated datasets, achieving higher global pose accuracy and improved physical consistency.

Method

GRIP Framework

Overview of the GRIP framework. Input Data consists of IMU and insole measurements. KinematicsNet estimates kinematic states, and the State Difference compares them with the simulated humanoid. DynamicsNet drives the humanoid through physics simulation-based control. The PRISM dataset provides diverse multi-modal training data.

Input Data

GRIP takes four IMU signals (two wrists, two insole-embedded) and foot pressure data, including vertical ground reaction forces (GRF), center of pressure (CoP), and binary contact labels. Sensors are compact enough for everyday wear.

KinematicsNet

A progressive LSTM-based network that estimates leaf-joint positions, full-joint positions, full-body joint angles, and leaf-joint velocities from raw sensor inputs frame by frame. Outputs are stored in a history buffer for fall recovery.

State Difference

An intermediate representation capturing discrepancies between the KinematicsNet estimates and the simulated humanoid state. Includes leaf-joint rotational/velocity differences and full-body root-relative joint position differences.

DynamicsNet

A physics-based MLP policy trained with PPO that drives a torque-controlled humanoid in a simulator. Observations include sensor data, State Difference, self-state, and environment height map. A fall recovery mechanism ensures stable inference.

PRISM Dataset

To enable robust training and fair evaluation of GRIP, we introduce Pressure and Inertial Sensing for Human Motion and Interaction (PRISM), a new large-scale multimodal dataset capturing diverse human motions with synchronized IMUs and insole pressure sensors, optical motion capture, and physical object models. PRISM covers daily activities (walking, jogging), slow movements (stretching, squats), fast sports actions (golf, baseball, soccer), and object interactions (stepping onto or sitting on objects). The dataset consists of 1,275 ten-second sequences from six subjects (~3.5 hours total) at 100 Hz, with SMPL pose labels obtained via MoSh.

PRISM dataset provides multimodal measurements including IMUs, foot pressure, motion capture, and 3D environment models for physically consistent motion evaluation.

Download Dataset (Coming Soon)

Results

Qualitative Comparison

Qualitative comparison across the three datasets. Our method accurately reconstructs foot placement on objects (PRISM), exhibits less position drift (UnderPressure), and captures slow weight-shifting motions (PSU-TMM100).

Quantitative Comparison

Lower values indicate better performance for all metrics. Bold = best, underline = second best.

Dataset Method MPJPE↓
[mm]
PEL-MPJPE↓
[mm]
PA-MPJPE↓
[mm]
MPJRE↓
[deg]
Acc↓
[m/s²]
FS↓
[m/s]
FP↓
[mm]
vGRF↓
[N]
PRISM PIP 248.5985.4833.35 17.086.680.20 10.71246.85
GlobalPose 198.3043.5031.29 12.017.720.22 9.72299.22
MobilePoser 267.4572.7655.69 17.996.200.19 9.97248.54
FoRM 199.6087.3463.75 20.239.670.36 15.64
SolePoser 82.75 9.68
GRIP (Ours) 182.4463.8546.47 13.897.300.21 5.77258.40
Under
Pressure
PIP 523.6529.8921.08 8.3512.590.32 1.43265.62
GlobalPose 301.1221.4917.41 7.4016.650.32 3.31287.12
MobilePoser 626.6244.2833.74 11.7312.620.35 1.27244.78
FoRM 553.1959.2932.60 18.5017.340.57 21.52
SolePoser 34.52 24.27
GRIP (Ours) 218.0937.2727.16 7.6413.220.31 0.00278.27
PSU-
TMM100
PIP 182.1487.5661.62 21.121.620.09 5.84367.87
GlobalPose 175.9663.0550.28 18.712.090.14 2.95340.11
MobilePoser 210.66112.3585.46 28.102.370.10 5.28358.19
FoRM 126.6098.0282.45 25.191.600.13 4.51
SolePoser 97.11 1.36
GRIP (Ours) 118.6070.3255.72 16.724.310.11 0.73316.06

Key Contributions

Minimal Sensor Setup

GRIP achieves accurate full-body motion estimation using only four IMUs worn on the wrists and feet, combined with insole pressure data—a setup practical for everyday use.

Two-Stage Architecture

KinematicsNet + DynamicsNet enables observer–controller decomposition: kinematic estimation feeds a physics-based humanoid controller, ensuring physically plausible motion without auxiliary forces.

PRISM Dataset

A new public multimodal dataset with 1,275 sequences (~3.5 hours) covering diverse motions, synchronized IMU/pressure, optical MoCap, and physical object models for comprehensive evaluation.

BibTeX

@inproceedings{hori2026grip,
    title     = {Ground Reaction Inertial Poser: Physics-based Human Motion Capture
                 from Sparse IMUs and Insole Pressure Sensors},
    author    = {Hori, Ryosuke and Song, Jyun-Ting and Luo, Zhengyi and Cao, Jinkun
                 and Shin, Soyong and Saito, Hideo and Kitani, Kris},
    booktitle = {Proceedings of the IEEE/CVF Conference on
                 Computer Vision and Pattern Recognition (CVPR)},
    year      = {2026}
}