TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition

Abstract

Creation of 3D content by stylization is a promising yet challenging problem in computer vision and graphics research. In this work, we focus on stylizing photorealistic appearance renderings of a given surface mesh of arbitrary topology. Motivated by the recent surge of cross-modal supervision of the Contrastive Language-Image Pre-training (CLIP) model, we propose TANGO, which transfers the appearance style of a given 3D shape according to a text prompt in a photorealistic manner. Technically, we propose to disentangle the appearance style as the spatially varying bidirectional reflectance distribution function, the local geometric variation, and the lighting condition, which are jointly optimized, via supervision of the CLIP loss, by a spherical Gaussians based differentiable renderer. As such, TANGO enables photorealistic 3D style transfer by automatically predicting reflectance effects even for bare, low-quality meshes, without training on a task-specific dataset. Extensive experiments show that TANGO outperforms existing methods of text-driven 3D style transfer in terms of photorealistic quality, consistency of 3D geometry, and robustness when stylizing low-quality meshes.

Method

Neural stylization

Disentanglement

Relighting

Material Editing

Robustness Comparison

Ablation Study

Citation

@article{chen2022tango,
  title={Tango: Text-driven photorealistic and robust 3d stylization via lighting decomposition},
  author={Chen, Yongwei and Chen, Rui and Lei, Jiabao and Zhang, Yabin and Jia, Kui},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  pages={30923--30936},
  year={2022}
}

TANGO: Text-driven PhotoreAlistic aNd Robust 3D Stylization via LiGhting DecompOsition

News