MMPart: Harnessing Multi-Modal Large Language Models for Part-Aware 3D Generation
Omid Bonakdar, Nasser Mozayani
- 发表年份
- 2025
- 访问权限
- 开放获取
摘要
Generative 3D modeling has advanced rapidly, driven by applications in VR/AR, metaverse, and robotics. However, most methods represent the target object as a closed mesh devoid of any structural information, limiting editing, animation, and semantic understanding. Part-aware 3D generation addresses this problem by decomposing objects into meaningful components, but existing pipelines face challenges: in existing methods, the user has no control over which objects are separated and how model imagine the occluded parts in isolation phase. In this paper, we introduce MMPart, an innovative framework for generating part-aware 3D models from a single image. We first use a VLM to generate a set of prompts based on the input image and user descriptions. In the next step, a generative model generates isolated images of each object based on the initial image and the previous step's prompts as supervisor (which control the pose and guide model how imagine previously occluded areas). Each of those images then enters the multi-view generation stage, where a number of consistent images from different views are generated. Finally, a reconstruction model converts each of these multi-view images into a 3D model.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
Genetic Programming: On the Programming of Computers by Means of Natural Selection
John R. Koza
1992