GAN interpolations in Tensorflow.js: The Basics

So you’ve trained your machine learning model and now it’s time to inspect the results. A typical method - or at least what I did a lot - is generating random images. But surely there are better options to explore your creation. One such way is interpolation. We can take two images and observe how the model behaves while transitioning between them.

Image of interpolation on Icon GAN model

I’m always getting caught up in off-by-one errors on these, so let’s write it down… and hopefully I can refer to it later :)

If you just want to try it out, you can skip to the bottom or check out Icon GAN.

Background

Generative adversarial networks (GAN) are a class of neural networks. One part generates images and another part differentiates between the generated (fake) images and the real images in the training set.

The generator is typically given a randomized input. The distribution of that input is called the latent space. Once the model is trained, the latent space becomes an embedding of the images. Images that look alike typically lie close together in the latent space. Depending on the training method, different directions in the latent space can also correspond to certain features - like glasses or beards in face images or generally shapes and colors. StyleGAN 2 for example explicitly aims to improve the mapping from latent space vectors to the generated images.

In StyleGAN 2, the latent space consists of one wxh matrix and multiple vectors. These are given to different layers of the network.

Image of StyleGAN 2 generator network with inputs

Loading the model

To generate images, we need to load a model. In Tensorflow.js, we can do this using:

const modelUrl = "https://raw.githubusercontent.com/Akatuoro/nn-models/master/icons-64-web/model.json";
const model = await tf.loadGraphModel(modelUrl);

The model has to be in the correct format. If you have a Tensorflow SavedModel or a Keras model, you can convert it using tfjs-converter.

Generating base images

We first need two images between which we interpolate. Or rather, we need their inputs in the latent space. Let’s call them A and B. They could be preselected - e.g. like in Icon GAN using drag & drop. But for this example, we generate them randomly:

const inputA = getRandomInput();
const inputB = getRandomInput();

So what is getRandomInput?

This specific model has one 64 x 64 x 1 matrix and five vectors of length 512 as input. When entering them as array, the matrix needs to be in the 3rd position. Since we will want to generate batches, we add an additional dimension at the beginning of each tensor.

function getRandomInput() {
    return [
        tf.randomNormal([1, 512]),
        tf.randomNormal([1, 512]),
        tf.randomUniform([1, 64, 64, 1]),
        tf.randomNormal([1, 512]),
        tf.randomNormal([1, 512]),
        tf.randomNormal([1, 512])
    ];
}

Testing that image generation works, we generate and render the image A:

const outputA = model.execute(inputA)
const imageA = toImg(outputA);
ctx.putImageData(imageA, 0, 0);

But our output is not yet an image - we need to clip any values that exceed the [0, 1] range, add an alpha channel and multiply by 255 to put it into an ImageData object. Also, we transpose our output matrix, so that we can generate a single big image.

function toImg(tensor) {
    const n = tensor.shape[0];

    // clip, alpha channel & multiply:
    let d = tf
        .concat([tensor.clipByValue(0, 1), tf.ones([n, 64, 64, 1])], 3)
        .mul(255);

    d = tf.transpose(d, [1, 0, 2, 3]);

    const im_data = new ImageData(new Uint8ClampedArray(d.dataSync()), 64 * n, 64);

    return im_data;
}

If you want multiple single images, you could simply iterate over the batch dimension:

// clip, alpha channel & multiply...
tf.split(tensor, n).forEach(d => {
    const im_data = new ImageData(new Uint8ClampedArray(d.dataSync()), 64, 64);
});

Interpolation

Now that we have two images and tested that basic image generation works, we can start interpolating. We take the inputs of our images and do a linear interpolation with \(n\) inputs \(Z_t\), including our base inputs:

\[Z_{t=0..n-1} = A + (B - A) * \frac{t}{n-1}\]

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

After calculating the inputs \(Z_t\), we concatenate them on the first axis for batch processing. As we have multiple input tensors, we need to perform the operations for each tensor:

// precalculate B - A
const v = inputB.map((t, i) => t.sub(inputA[i]));

const n = 9;
const input = inputA.map((t, i) => {
	const combined = [];
	// calculate all Z_t for the current input tensor index
	for (let j = 0; j < n; j++) {
		combined.push(t.add(v[i].mul(j / (n - 1))));
	}
	// concatenate Z_t for batch processing
	return tf.concat(combined);
});

Now simply execute the model and render the resulting image using our predefined method:

const output = model.execute(input);
const imageData = toImg(output);
ctx.putImageData(imageData, 0, 0);

Check out the resulting CodePen:

See the Pen GAN Interpolation by Akatuoro (@akatuoro) on CodePen.

What model are you going to interpolate on?