Synthetic Data Generation for Computer Vision in Blender | by Alex Martinelli | Aug, 2022

(part 1)

Example synth data from thiscatdoesnotexist.com, paralleldomain.com and Microsoft “Fake It Till You Make It”

What: This entry gives an introduction to synthetic-data-generation, and how you can use it via Blender to train performant and robust vision models. We’ll provide an overview of the Blender setup and, for demonstrative purposes, present a concrete visual classification scenario from the fashion domain.

Why: to leverage Blender procedural capabilities and adopt a data-centric approach to get better machine-learning models with little or no need for human annotations.

Who: we will rely on Blender >3.1 and Python ≥ 3.7. Generated images can then be used for any downstream task, regardless of possible dependent frameworks (eg Tensorflow, PyTorch).

Synthetic Data Generation (SDG) encompasses a variety of methods that aim at programmatically generating data to support downstream tasks. In statistics and Machine Learning (ML), the resulting goal is to synthesize samples with the same distribution of a target domain, to be used for model training or testing purposes. It is part of the data-centric ML approach, where in order to achieve better performances we actively work on data, instead of models, algorithms, or architectures.

SDG is adopted for multiple reasons, the primary ones being:

  • minimize the need for human labeling and curation
  • Facilitate and/or reach the data requirements for ever-higher capacity models
  • Addressing issues such as generality, robustness, portability, biases
  • overcome real data usage restrictions (privacy and regulations)

In Computer Vision (CV) we are interested in synthesizing realistic visual samples (most common media such as images and videos). The two major approaches to synthesize data for this domain are generative-models and Computer Graphics (CG) pipelines. Hybrid approaches exist that combine multiple methods in different measures based on the target setup.

Think for example about generating images of non-existing cats to train a catVSdog classifier or feeding images from games and simulated environments to bootstrap the training of self-driving systems, or render an unlimited variety of CG human faces for landmark localization.

In this article we will focus on the CG approach, which relies on traditional software and tools to manipulate 3D content (modeling), apply materials (texturing), and synthesize 2D images (rendering). Let there be a Blender.

Blender is a free and open-source 3D CG software toolset. It underwent extraordinary improvements in the last couple of years, especially with the revamp 2.8 version: redesigned user interface and workspace, real-time Eevee renderer, optimized Cycles (path-tracer engine), 2D animation with Grease-Pencil, better Shader nodes and the recent extra-powerful Geometry-Nodes system. All this (and more) plus the Python API makes it an obviously popular choice for researchers and hobbyists interested in programmatic and procedural control of a 3D environment, without the need to rely on less user-friendly 3D tools or libraries.

For an introduction to scripting in Blender, see the first section of one of my previous entries, or check this primer by Jeremy Behreandt.

The power of a programmatic setup for SDG (and beyond) can be demonstrated with a few lines of code. The following Python snippet allows to randomize in a controlled way the camera location. If you have a Track To constraint on your camera, you are also guaranteed that it will point to the same location from different angles.

The following snippet shows instead how to randomize the World (meaning the scene environment) in terms of background color and light intensity. All it requires is a basic node setup as shown here.

Basic World setup to randomize

At this point, any object placed in the scene can already be rendered from different angles, under different light conditions and backgrounds.
The following code is all that’s required to render the current scene to file.

One can expand on these basic blocks and automate/randomize any aspect relevant to their needs. Rendered images can then be used in any downstream task. The real-time Eeevee renderer guarantees that you can generate images in terms of seconds or less, and so can easily scale for data-hungry regimes. Cycles is also an option is you need higher realism, but you are then looking at longer rendering times and stronger dependency on a good GPU.

On top of this, by using Blender you have access to all the 3D scene and object information, and can use passes to split targeted content into separate rendered images (eg depth, normal-map, ambient-occlusion, segmentation maps via object or material indexes).

For this entry, we will focus however on the sole visual classification setting, and explore procedural materials, using a simple but concrete classification example from the fashion domain: textile pattern classification.

Example of fashion items with different pattern types.

Let’s introduce the use case: we have a visual classification task where we need to classify the pattern of a fashion item. It is a multi-class problem, where we rely on a set of previously defined classes (eg plain, striped, dotted, floral, checkered, animal-print). We all have an intuitive understanding of these classes, they apply to any fashion item, being a dress, a pair of sneakers, or a bag. We could manually collect and curate data for this task, but what about just synthetically generating it via Blender?

We first need to obtain some 3D objects proper for our target data distribution. We can easily find a plethora of 3D fashion content in premium packages or even single free models. We could even approach modeling itself procedurally, but that’s a topic that will be covered in a separate future entry.

Once we have our 3D objects, we can start working on materials.
We here show the node tree for three sample materials: plain, striped and floral. The former two can be obtained purely in Blender. For floral we rely instead on external images that need to be downloaded separately.

Plain material node-tree
Striped material node-tree
Floral material node-tree

Additional to such materials we only need the following code. We have a naive function to generate random colors and a wrapping function for each target class.

When run, such functions randomize the corresponding Blender material (names used in the code must match the ones in Blender, this is both for overall node-trees names and specific nodes names).

If we combine the above setup with the randomization provided in the previous section, we can start rendering.

Random sample of rendered images

With the generated images you can already try to train a vision model and see how it performs on your real, target domain data. You don’t need to collect, scrape and curate real images, you don’t need human annotations and validation and you have a fully programmatic setup to expand upon and further exploit.

While this was a toy use case, the more complex or niche, or your concepts/classes are underrepresented, the more an SDG approach will save you from the pain of data collection and curation.

A synthetic dataset is good when is versatile and capable of generalizing to real datasets. Synthetic data can be used in a multitude of ways. It can be a supplement to already available real data during training, where different ratios of synthetic VS real are often tested to understand the best compromise for model performances. Sometimes it can even be just a means to test/validate the robustness of vision models.

Regarding the effectiveness of using synthetic data, that often needs to be tested empirically, but multiple recent works and papers showcased the power of SDG.

One of the main limitations is the so-called domain gap, meaning the intrinsic differences between real and synthetic images. For our Blender example, more varied and detailed 3D objects, richer textures, or using a Cycles renderer would produce images that look more realistic, but there would still be a considerable gap compared to real fashion images, even more if including human models. Specific lines of research like domain-adaptation look at how to tackle such issues.

SDG requires also expert domain knowledge (the same that would be needed for human annotations), and an overall setup of quality-control, to avoid an even wider and more damaging domain gap. There is always a risk of injecting implicit biases in the generation process, or entirely missing possible outlier (yet relevant) areas of your domain. Take again our toy use case: we are required to have a clear understanding of our guidelines regarding what makes a pattern floral. Does it need to be realistic flower depiction? Does it need to be colorful enough? How many levels of stylization are acceptable? Without clear answers to such questions we would end up defining the synthetic generation process based on our biases and assumptions, and then most likely failing to capture the requirements of our real data.

In this entry, we gave a high-level introduction to Synthetic-Data-Generation (SDG) and how it can be easily approached via Blender.

The pattern classification setting allowed us to demonstrate how Blender materials and Python scripting can be combined to generate synthetic data via a traditional computer-graphics pipeline.

This is a simple yet powerful setup, which can already be a starting point for a multitude of computer-vision use-cases.

The plan for future entries of this series is to expand on the introduction presented here, and explore different and more complex scenarios for SDG in the visual domain, while showingcasing how Blender can be used as a powerful tool to tackle them.

We more than welcome feedback and suggestions to drive such future entries. Overall we are planning to cover aspects such as depth, segmentation, and other scene information that you can get for free when relying on Blender. We want also to go beyond pure materials node-tree to tackle procedural modeling and see for example how geometry nodes can be used to manipulate meshes for SDG.

From there we’ll move to hybrid approaches for SDG, considering the important recent advances made in the field of machine learning for computer graphics, like object/body generation and reconstruction, generative models for texture synthesis, and neural rendering.

Want to Connect?You can find more of my experiments and explanations on Twitter and see my graphics results on Instagram.

Leave a Comment