GaussianVTON

3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting

Corresponding Author

1Northwestern Polytechnical University

2The University of Hong Kong

Abstract

The increasing prominence of e-commerce has underscored the importance of Virtual Try-On (VTON). However, previous studies predominantly focus on the 2D realm and rely heavily on extensive data for training. Research on 3D VTON primarily centers on garment-body shape compatibility, a topic extensively covered in 2D VTON. Thanks to advances in 3D scene editing, a 2D diffusion model has now been adapted for 3D editing via multi-viewpoint editing. In this work, we propose GaussianVTON, an innovative 3D VTON pipeline integrating Gaussian Splatting (GS) editing with 2D VTON. To facilitate a seamless transition from 2D to 3D VTON, we propose, for the first time, the use of only images as editing prompts for 3D editing. To further address issues, e.g., face blurring, garment inaccuracy, and degraded viewpoint quality during editing, we devise a three-stage refinement strategy to gradually mitigate potential issues. Furthermore, we introduce a new editing strategy termed Edit Recall Reconstruction (ERR) to tackle the limitations of previous editing strategies in leading to complex geometric changes. Our comprehensive experiments demonstrate the superiority of GaussianVTON, offering a novel perspective on 3D VTON while also establishing a novel starting point for image-prompting 3D scene editing.

Results

"Make it look like autumn"
"Make it look like autumn"

Click Image to View Gaussians

Citation

If you use this work or find it helpful, please consider citing: (bibtex)

@article{chen2024gaussianvton,
  title={GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting},
  author={Chen, Haodong and Huang, Yongle and Huang, Haojian and Ge, Xiangsheng and Shao, Dian},
  journal={arXiv preprint arXiv:2405.07472},
  year={2024}
  }

Acknowledgments

We would like to express our gratitude to Xiling Liu for generously providing the multi-view image data as a volunteer, and to Siru Zhong for his help in visualizing the results on this webpage. This website is constructed using the source code provided by Instruct-NeRF2NeRF, and the visualization of 3D GSs is primarily implemented using splat. We are grateful for their contributions to the template and tool.