Traditionally, people are used to consume visual content primarily in 2D form, namely images and videos. Although different from the way we are used to experience the world through our visual system, humans have adapted well to consume 2D visual content, as this has been a limitation imposed by the availability of technologies for capturing, transmitting and displaying visual content. Therefore, up until a few years ago, consumer 3D content has been limited to a few specific scenarios such as animation movies and video games, where the content creation process is carried out by professional studios.
However, in recent years there has been growing demand for 3D content driven by technological advances in computing power, bandwidth, hardware and software, giving rise to new fields such as 3D printing, augmented and virtual reality (AR/VR) as well as 3D web-based experiences. In addition, the appetite of the movies and gaming segments for 3D models grows exponentially over time. Figure 1 shows the increase in the number of artists involved in the releases of GTA over time as a surrogate measure to the increase in 3D models required.
Figure 1: Growth in number of technical artists through the different releases of GTA.
A particular emerging use case of 3D content is in online shopping, where the customers are empowered with the ability to observe and interact with the products in 3D, and optionally experience them virtually in their own personal settings with the use of AR (see Figure 2 for an illustration). See for example Houzz, Wayfair, Target, Ikea to name a few.
Figure 2: Example of Augmented Reality use in retail
Figure 3: Three.js downloads per year
This phenomenon can be regarded as the latest link in the natural evolution of web delivered content towards media types which are richer and more compelling to humans, analogous to the transition from mostly textual content in the early days of the internet to images, audio and video of increasing resolutions nowadays.
The realm of 3D-based consumer experiences has seen some major advances in the past years, mostly related to content delivery technologies such as 3D printers, AR/VR goggles and SW platforms such as ARCore and ARKit. In contrast, the methods for creation of the 3D models have stagnated and despite the improvement in 3D content authoring SW and HW tools, we have not seen a paradigm shift in this field. It is important to note that the acquisition of 3D visual content is an inherently more complex process than capturing 2D images - something has to be done to obtain the additional dimension in the data.
Current methods for 3D asset creation
The existing methods for 3D model creation can be roughly divided into three categories. Let’s briefly review them:
Photogrammetry is a relatively popular method for 3D model creation, as it is capable of producing high quality models under certain conditions. Its main limitations are the need for a relatively large number of images (few dozens to thousands, depending on the model size and geometry), acquired under controlled settings such as full coverage of the reconstructed object, overlap between the different views and consistent lighting.
Similar to photogrammetry, 3D scanning has advanced in the past years from an accessibility standpoint. Namely, technological advances in microelectronics have enabled integrating 3D sensors into mobile devices.
The caveats of current 3D asset creation methods
Besides the technical limitations of the different asset creation methods discussed above, they all bear inherent drawbacks. Photogrammetry and 3D scanning both have the trivial, yet often overlooked limitation that they require the physical presence of the imaged object. This may be a non-issue for a consumer creating a one-of model for recreational use or other small scale use cases. However, for larger scale uses, such as a retailer with thousands of items to model, or a game developer requiring many 3D models just to create a realistic scene, those approaches are infeasible. In addition, both of those methods require a relatively long and involved acquisition process.
For those reasons, most at-scale uses of 3D content rely on manual modeling, where existing content in the form of 2D images depicting the object to be modelled suffices. Naturally, the main limitation here is the large amount of professional human labor required, which of course, dominates the time and cost for creating the content. Creation of one simple 3D model of an object typically costs at least tens of dollars and takes a few hours to model and may reach hundreds of dollars and a few days of work for complex objects.
Looking forward, we can identify some clear trends forming around 3D asset creation solutions. We choose to separate those into small scale solutions, i.e. consumer grade solutions for creating a handful of 3D models, vs. large scale solutions suitable for businesses and enterprises looking to generate at least thousands of 3D models.
Small scale solutions
For consumer-level 3D model creation, we expect to continue to see the contemporary trend of enabling 3D capture using mobile devices becoming more and more common. This will involve either dedicated 3D sensing hardware embedded within the mobile devices or on-device consumer-facing photogrammetry SW, enabling creation of 3D models from many 2D images, or a video.
Looking at present academic publications in the field, it is likely we will see in the near future those mobile-based solutions improved in computational performance, features and quality using AI technologies such as Neural Radiance Fields.
Following the analysis we presented in the previous sections, it is clear that any solution for 3D model creation at scale cannot rely upon 360-degree acquisition of the object to be reconstructed due to the lengthy and cumbersome nature of such an acquisition process. We also remind the reader that there are many applications where the object itself is not physically available. Therefore, we believe that any scalable solution must be based on existing and ubiquitous data – namely available 2D images of the objects to be reconstructed.
Once again, examining state of the art academic publications, we see that AI-based 3D reconstruction from a single image is an active research topic, see e.g. the papers here and here. It is noteworthy there are very few works utilizing several 2D images as input to the 3D reconstruction process and those typically require additional information regarding the acquisition setup (i.e. camera parameters), which are unknown in practice.
Our vision for the future of 3D asset creation is that the capability of creating 3D assets from a handful of existing images will become commercially available in the very near future. At first, this technology will be utilized for mass creation of 3D models of common items such as electrical appliances, cars, furniture, buildings, etc. by large enterprises looking to convert large repositories of existing images to 3D. Further ahead, this technology is likely to be adopted for consumer applications, unlocking additional capabilities on mobile devices.
Our mission around this is very simple: To make it happen. For that end, we teamed with 3D technical artists to learn how they approach 3D modelling. We also researched the latest computer vision and AI algorithms, and with trial and error integrated this knowledge to build a working pipeline. Further, during this process we realized even the best algorithms won’t do without some good datasets and the compute power required to train the algorithms. To address those issues, we complemented our algorithmic pipeline with a holistic infrastructure capable of handling our data and training our Deep Learning models in a quick and cost-effective way.