A Proposal: Neural Generation of Novel Procedural Shape Programs from 3D Mesh Data

Shuyuan Zhang

September 2024

Motivation

Procedural modeling is a powerful tool for generating complex 3D shapes through manipulating a limited number of parameters, yet creating the procedural programs is a challenging task. There exists various work in the Inverse Procedural Modeling domain that rely on existing shape programs to achieve automatic inference on shape parameters, however, few work had focused on solving the problem of generating novel shape programs from scratch.

The furthest goal for such a field of study can be abstracted as follows: given user requests in arbitrary form, an ideal solution should be able to generate a procedural program that can produce the desired 3D shape, at the same time exposing parameters that are easy to manipulate for the user.

We may consider a gradual process of decomposing the grand goal into smaller, more manageable sub-tasks as the following: (visualized in figure 1)

Steps in re-formulating the novel shape program synthesis task visualized.

This proposal attempts to tackle the second step back mentioned above, which is generating novel procedural shape programs from 3D mesh data.

Inference process visualized.

Method Overview

The proposed method has a straight-forward inference process (visualized in figure 2): given a 3D mesh as input, we employ a state-of-the-art mesh encoder to produce a feature embedding, and a text-generation decoder network trained on gathered data will produce a shape program as executable program code that can reproduce the input mesh, finally with proper refactoring, we end up with a new procedural shape program that include the original input mesh as one of its output variations.

Data preparation and training process visualized.

In terms of data preparation and training, we take high level inspiration from the work on diffusion models from , which creates training data through disturbing an original piece of data entry. As visualized in figure 3, given an arbitrary procedural shape program, we create disturbed training data in two ways:

Prompting LLMs to perturb shape programs.
A simple LLM-based perturb algorithm with feedback loops and potentials to extend.

The rest of the data preparation and training process is also visualized in figure 3: we execute the shape programs in shape engine to obtain corresponding mesh, and we encode them with the same mesh encoder we will use in inference to obtain mesh embeddings. Pairing the generated feature embedding and the shape program, we now have a dataset for training the decoder network.

Note that properly exposed parameters are a necessary part of a procedural model, thus, in the first step above we may record a new set of data (of shape programs with exposed parameters and corresponding programs with embedded parameter values) for training a Shape Program Refactoring model, which in its essence will be another decoder model.

Related Work

OpenSCAD is a popular open-source software for creating 3D models (especially procedural ones) from code. Previously, we have used Blender’s library on creating DAG-based procedural shape programs, however, we need to use another open-source library for converting Python codes into Blender’s geometry node system, which adds complexity both to the process and to the grammar the decoder needs to learn.

There exists a wide range of choices for mesh encoders, such as PointNet++ and MeshCNN . In terms of decoder networks for text generation, LLMs like GPT-4 or LLaMA are natural choices, however, since at least some fine-tuning is needed for these models and the cost will not be negligible, we may start with a similar but simpler architecture, such as a Transformer-based decoder.

Discussions

The lack of optimization in the process yields its own pros and cons: it is now easy to migrate the how pipeline to 2D image data as input, but there is chance that we cannot explicitly control the quality of generated shape programs. This can serve as a fallback option, in case the performance of the single-pass decoder network suffers, and when migrating to 2D, as mentioned previously, there are work-arounds for 3D mesh-based constraints. In fact, in the earliest versions of this proposal, a purely text-based and prompting-based solution with visual LMs was involved, but with the risk of a never-converging optimization loop. This proposal is thus open to more discussions.