Image- and video-based 3D human recovery (i.e., pose and shape estimation) have achieved substantial progress.
However, due to the prohibitive cost of motion capture, existing datasets are often limited in scale and diversity.
In this work, we obtain massive human sequences by playing the video game with automatically annotated 3D ground truths.
Specifically, we contribute GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine,
featuring a highly diverse set of subjects, actions, and scenarios. More importantly, we study the use of game-playing data
and obtain five major insights. First, game-playing data is surprisingly effective. A simple frame-based baseline trained
on GTA-Human outperforms more sophisticated methods by a large margin. For video-based methods, GTA-Human is even on par
with the in-domain training set. Second, we discover that synthetic data provides critical complements to the real data
that is typically collected indoor. Our investigation into domain gap provides explanations for our data mixture strategies
that are simple yet useful. Third, the scale of the dataset matters. The performance boost is closely related to the
additional data available. A systematic study reveals the model sensitivity to data density from multiple key aspects.
Fourth, the effectiveness of GTA-Human is also attributed to the rich collection of strong supervision labels (SMPL parameters),
which are otherwise expensive to acquire in real datasets. Fifth, the benefits of synthetic data extend to larger models
such as deeper convolutional neural networks (CNNs) and Transformers, for which a significant impact is also observed.
We hope our work could pave the way for scaling up 3D human recovery to the real world.
Please refer to MMHuman3D where we provide data download links, code, and pretrained models. MMHuman3D is an open source PyTorch-based codebase for the use of 3D human parametric models in computer vision and computer graphics.
Examples
Citation
@article{cai2021playing,
title={Playing for 3D human recovery},
author={Cai, Zhongang and Zhang, Mingyuan and Ren, Jiawei and Wei, Chen and Ren, Daxuan and
Lin, Zhengyu and Zhao, Haiyu and Yang, Lei and Liu, Ziwei},
journal={arXiv preprint arXiv:2110.07588},
year={2021}
}