PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation
NeurIPS 2023
- Zhaoxi Chen1
- Fangzhou Hong1
- Haiyi Mei2
- Guangcong Wang1
- Lei Yang2
- Ziwei Liu1 1S-Lab, Nanyang Technological University 2Sensetime Research
Abstract
PrimDiffusion performs the diffusion and denoising process on a set of primitives which compactly represent 3D humans. This generative modeling enables explicit pose, view, and shape control, with the capability of modeling off-body topology in well-defined depth. Moreover, our method can generalize to novel poses without post-processing and enable downstream human-centric tasks like 3D texture transfer.
Framework
We represent the 3D human as K primitives learned from multi-view images. Each primitive Vk has independent kinematic parameters {Tk, Rk, sk} (translation, rotation, and per-axis scales, respectively) and radiance parameters {ck, σk} (color and density). For each time step t, we diffuse the primitives V0 with noise ϵ sampled according to a fixed noise schedule. The resulting Vt is fed to gΦ(·) which learns to predict the denoised volumetric primitives.
To get rid of per-subject optimization, we propose an encoder-only network that is capable of learning primitives from multi-view images across identities. The encoder consists of a motion branch and an appearance branch, which are fused by the proposed cross-modal attention layer to get kinematic and radiance information of primitives.
Visualization of the Denoising Process
We visualize the denoising process of primitives and corresponding 360-degree novel views.
Video
Citation
Acknowledgements
PrimDiffusion is implemented on top of the DVA and Latent-Diffusion codebase. The training data are rendered via XRFeitoria toolchain.
The website template is borrowed from Mip-NeRF.