HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling

Zhongang Cai*
Daxuan Ren*
Ailing Zeng*
Zhengyu Lin*
Tao Yu*
Wenjia Wang*

Xiangyu Fan
Yang Gao
Yifan Yu
Liang Pan
Fangzhou Hong
Mingyuan Zhang

Chen Change Loy
Lei Yang^
Ziwei Liu^

Shanghai Artificial Intelligence Laboratory
S-Lab, Nanyang Technological University

SenseTime Research
The Chinese University of Hong Kong
Tsinghua University

* co-first authors
^ co-corresponding authors

ECCV 2022 (Oral)

4D human sensing and modeling are fundamental tasks in vision and graphics with numerous applications. With the advances of new sensors and algorithms, there is an increasing demand for more versatile datasets. In this work, we contribute HuMMan, a large-scale multi-modal 4D human dataset with 1000 human subjects, 400k sequences and 60M frames. HuMMan has several appealing properties: 1) multi-modal data and annotations including color images, point clouds, keypoints, SMPL parameters, and textured meshes; 2) popular mobile device is included in the sensor suite; 3) a set of 500 actions, designed to cover fundamental movements; 4) multiple tasks such as action recognition, pose estimation, parametric human recovery, and textured mesh reconstruction are supported and evaluated. Extensive experiments on HuMMan voice the need for further study on challenges such as fine-grained action recognition, dynamic human mesh reconstruction, point cloud-based parametric human recovery, and cross-device domain gaps.

An introductory video of HuMMan


[2023-01-23] Release of textured meshes and toolbox.

[2023-01-23] Minor fixes on the mask data, download links have been updated.

[2023-01-11] Release of HuMMan v1.0: Reconstruction Subset.

[2022-10-27] We presented HuMMan as an oral paper at ECCV'22 (Tel Aviv, Israel).

[2022-08] Release of HuMMan v0.1 (no longer available, please use v1.0).

Register for future updates!

Scale and Modalities

Action Set

Subject Examples

Full Text


HuMMan v1.0: Reconstruction Subset (New!)

HuMMan v1.0: Reconstruction Subset consists of 153 subjects and 339 sequences. Color images, masks (via matting), SMPL parameters, and camera parameters are provided. It is a challenging dataset for its collection of diverse subject appearance and expressive actions. Moreover, it unleashes the potential to benchmark reconstruction algorithms under realistic settings with commercial sensors, dynamic subjects, and computer vision-powered automatic annotations. We also provide textured meshes reconstructed using a classical pipeline from multi-view RGB-D images.

Color images, masks, SMPL parameters, and camera parameters:

Part 1: Aliyun or OneDrive(CN) (~81 GB)
Part 2: Aliyun or OneDrive(CN) (~83 GB)
Part 3: Aliyun or OneDrive(CN) (~80 GB)

Textured meshes: Aliyun or OneDrive(CN) (~22 GB)

Suggested splits: train and test.

More details and a toolbox can be found here.

Please contact Zhongang Cai ( for feedback or to add benchmarks.

Benchmark: Generalizable Animatable Avatar from Single Image

[1] Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering
[2] MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images


HuMMan is under S-Lab License v1.0.


    title={HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling},
    author={Cai, Zhongang and Ren, Daxuan and Zeng, Ailing and Lin, Zhengyu and Yu, Tao and Wang, Wenjia and
            Fan, Xiangyu and Gao, Yang and Yu, Yifan and Pan, Liang and Hong, Fangzhou and Zhang, Mingyuan and
            Loy, Chen Change and Yang, Lei and Liu, Ziwei},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
    month = {October},