Whether the input is a text description or a few images, the first thing we do is to standardize and clean up the inputs.
As an example, for image data we will identify the object of interest, crop it out of the image and remove the background.
For textual inputs, we typically remove meaningless tokens and convert the remaining text into a more readily machine readable format.
The next step in our pipeline involves fine-grained analysis of the input data, regardless of its format, to extract a compact, meaningful and interpretable representation of the content in the input, which we term object code.
In the final stage of our pipeline, we generate the actual 3D asset out of the object code, in addition to control parameters (e.g. level of detail, texture resolution) which affect the representation of the object at hand.