Home /Research /Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors
MANIPULATION

Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors

Hritam Basak, Hamid Tabatabaee, Shreekant Gayaka, Mingfeng Li, Xin Yang, Cheng-Hao Kuo, Arnie Sen, Min Sun, Zhaozheng Yin

Year
2025
Citations
1

Abstract

3D object generation from a single unposed RGB image is essential for robotic perception, as reconstructing complete geometry and texture is essential for precise manipulation, grasping, and scene understanding, which is key for autonomous navigation and dexterous interaction. Recent advancements in image-to-3D employ Gaussian Splatting with pre-trained 2D or 3D diffusion models, but a disparity exists: 2D models generate high-fidelity textures yet lack geometric consistency, while 3D models ensure structural coherence but produce overly smooth textures. To address this, we introduce a two-stage frequency-based distillation loss integrated with Gaussian Splatting, leveraging geometric priors from a 3D diffusion model’s low-frequency spectrum for structural consistency and a 2D diffusion model’s high-frequency details for sharper textures. Our approach achieves state-of-the-art 3D reconstruction quality, significantly improving robotic perception pipelines. Additionally, we demonstrate the easy adaptability of our method for highly accurate object pose estimation and tracking, which is critical for precise robotic grasping, manipulation, and scene understanding. Additional results can be found in the supplementary file.

Keywords

Prior probabilityGaussianObject (grammar)Consistency (knowledge bases)3D reconstructionKey (lock)Texture mappingImage (mathematics)Cognitive neuroscience of visual object recognition

Related papers

Browse all MANIPULATION papers