Generative Adversarial Networks
ACTL3143 & ACTL5111 Deep Learning for Actuaries
GANs consist of two neural networks, a generator, and a discriminator, and they are trained simultaneously through adversarial training. The generator takes in random noise and generates a synthetic data observation. The goal of the generator is to learn how to generate synthetic data that resembles actual data very well. The discriminator distinguishes between real and synthetic data and classifies them as ‘real’ or ‘fake’. The goal of the discriminator is to correctly identify whether the input is real or synthetic. An equilibrium is reached when the generator is able to generate data that very well resembles actual data and the discriminator is unable to distinguish them with high confidence.
Traditional GANs
Before GANs we had autoencoders
An autoencoder takes a data/image, maps it to a latent space via en encoder module, then decodes it back to an output with the same dimensions via a decoder module.
GAN faces
Try out https://www.whichfaceisreal.com.
Example StyleGAN2-ADA outputs
GAN structure
GAN intuition
Intuition about GANs
- A forger creates a fake Picasso painting to sell to an art dealer.
- The art dealer assesses the painting.
How they best each other:
- The art dealer is given both authentic paintings and fake paintings to look at. Later on, the validity his assessment is evaluated and he trains to become better at detecting fakes. Over time, he becomes increasingly expert at authenticating Picasso’s artwork.
- The forger receives an assessment from the art dealer everytime he gives him a fake. He knows he has to perfect his craft if the art dealer can detect his fake. He becomes increasingly adept at imitating Picasso’s style.
Generative adversarial networks
- A GAN is made up of two parts:
- Generator network: the forger. Takes a random point in the latent space, and decodes it into a synthetic data/image.
- Discriminator network (or adversary): the expert. Takes a data/image and decide whether it exists in the original data set (the training set) or was created by the generator network.
Discriminator
= layers.LeakyReLU(alpha=0.2)
lrelu
= keras.Sequential([
discriminator =(28, 28, 1)),
keras.Input(shape64, 3, strides=2, padding="same", activation=lrelu),
layers.Conv2D(128, 3, strides=2, padding="same", activation=lrelu),
layers.Conv2D(
layers.GlobalMaxPooling2D(),1)])
layers.Dense(
discriminator.summary()
Generator
= 128
latent_dim = keras.Sequential([
generator 7 * 7 * 128, input_dim=latent_dim, activation=lrelu),
layers.Dense(7, 7, 128)),
layers.Reshape((128, 4, strides=2, padding="same", activation=lrelu),
layers.Conv2DTranspose(128, 4, strides=2, padding="same", activation=lrelu),
layers.Conv2DTranspose(1, 7, padding="same", activation="sigmoid")])
layers.Conv2D( generator.summary()
Training GANs
GAN cost functions
GAN - Schematic process
First step: Training discriminator:
- Draw random points in the latent space (random noise).
- Use generator to generate data from this random noise.
- Mix generated data with real data and input them into the discriminator. The training targets are the correct labels of real data or fake data. Use discriminator to give feedback on the mixed data whether they are real or synthetic. Train discriminator to minimize the loss function which is the difference between the discriminator’s feedback and the correct labels.
GAN - Schematic process II
Second step: Training generator:
- Draw random points in the latent space and generate data with generator.
- Use discriminator to give feedback on the generated data. What the generator tries to achieve is to fool the discriminator into thinking all generated data are real data. Train generator to minimize the loss function which is the difference between the discriminator’s feedback and the desired feedback: “All data are real data” (which is not true).
GAN - Schematic process III
- When training, the discriminator may end up dominating the generator because the loss function for training the discriminator tends to zero faster. In that case, try reducing the learning rate and increase the dropout rate of the discriminator.
- There are a few tricks for implementing GANS such as introducing stochasticity by adding random noise to the labels for the discriminator, using stride instead of pooling in the discriminator, using kernel size that is divisible by stride size, etc.
Train step
# Separate optimisers for discriminator and generator.
= keras.optimizers.Adam(learning_rate=0.0003)
d_optimizer = keras.optimizers.Adam(learning_rate=0.0004)
g_optimizer
# Instantiate a loss function.
= keras.losses.BinaryCrossentropy(from_logits=True)
loss_fn
@tf.function
def train_step(real_images):
# Sample random points in the latent space
= tf.random.normal(shape=(batch_size, latent_dim))
random_latent_vectors # Decode them to fake images
= generator(random_latent_vectors)
generated_images # Combine them with real images
= tf.concat([generated_images, real_images], axis=0)
combined_images
# Assemble labels discriminating real from fake images
= tf.concat([
labels 1)),
tf.zeros((batch_size, 0], 1))], axis=0)
tf.ones((real_images.shape[
# Add random noise to the labels - important trick!
+= 0.05 * tf.random.uniform(labels.shape)
labels
# Train the discriminator
with tf.GradientTape() as tape:
= discriminator(combined_images)
predictions = loss_fn(labels, predictions)
d_loss = tape.gradient(d_loss, discriminator.trainable_weights)
grads zip(grads, discriminator.trainable_weights))
d_optimizer.apply_gradients(
# Sample random points in the latent space
= tf.random.normal(shape=(batch_size, latent_dim))
random_latent_vectors
# Assemble labels that say "all real images"
= tf.ones((batch_size, 1))
misleading_labels
# Train the generator (note that we should *not* update the weights
# of the discriminator)!
with tf.GradientTape() as tape:
= discriminator(generator(random_latent_vectors))
predictions = loss_fn(misleading_labels, predictions)
g_loss
= tape.gradient(g_loss, generator.trainable_weights)
grads zip(grads, generator.trainable_weights))
g_optimizer.apply_gradients(return d_loss, g_loss, generated_images
Grab the data
# Prepare the dataset.
# We use both the training & test MNIST digits.
= 64
batch_size = keras.datasets.mnist.load_data()
(x_train, _), (x_test, _) = np.concatenate([x_train, x_test])
all_digits = all_digits.astype("float32") / 255.0
all_digits = np.reshape(all_digits, (-1, 28, 28, 1))
all_digits = tf.data.Dataset.from_tensor_slices(all_digits)
dataset = dataset.shuffle(buffer_size=1024).batch(batch_size)
dataset
# In practice you need at least 20 epochs to generate nice digits.
= 1
epochs = "./" save_dir
Train the GAN
%%time
for epoch in range(epochs):
for step, real_images in enumerate(dataset):
# Train the discriminator & generator on one batch of real images.
= train_step(real_images)
d_loss, g_loss, generated_images
# Logging.
if step % 200 == 0:
# Print metrics
print(f"Discriminator loss at step {step}: {d_loss:.2f}")
print(f"Adversarial loss at step {step}: {g_loss:.2f}")
break # Remove this if really training the GAN
Conditional GANs
Unconditional GANs
Conditional GANs
Hurricane example data
Hurricane example
Hurricane example (after 54s)
Hurricane example (after 21m)
Hurricane example (after 47m)
Hurricane example (after 4h10m)
Hurricane example (after 14h41m)
Image-to-image translation
Example: Deoldify images #1
Example: Deoldify images #2
Example: Deoldify images #3
Explore the latent space
Generator can’t generate everything
Problems with GANs
They are slow to train
StyleGAN2-ADA training times on V100s (1024x1024):
GPUs | 1000 kimg | 25000 kimg | sec / kimg | GPU mem | CPU mem |
---|---|---|---|---|---|
1 | 1d 20h | 46d 03h | 158 | 8.1 GB | 5.3 GB |
2 | 23h 09m | 24d 02h | 83 | 8.6 GB | 11.9 GB |
4 | 11h 36m | 12d 02h | 40 | 8.4 GB | 21.9 GB |
8 | 5h 54m | 6d 03h | 20 | 8.3 GB | 44.7 GB |
Uncertain convergence
Converges to a Nash equilibrium.. if at all.
Mode collapse
Generation is harder
# Separate optimisers for discriminator and generator.
= keras.optimizers.Adam(learning_rate=0.0003)
d_optimizer = keras.optimizers.Adam(learning_rate=0.0004) g_optimizer
Advanced image layers
Conv2D
GlobalMaxPool2D
Conv2DTranspose
Vanishing gradients (I)
Vanishing gradients (II)
Wasserstein GAN
We’re comparing distributions
Trying to minimise the distance between the distribution of generated samples and the distribution of real data.
Vanilla GAN is equivalent to minimising the Jensen–Shannon Divergence between the two.
An alternative distance between distributions is the Wasserstein distance.
Discriminator Critic
Critic D : \text{Input} \to \mathbb{R} how “authentic” the input looks. It can’t discriminate real from fake exactly.
Critic’s goal is
\max_{D \in \mathscr{D}} \mathbb{E}[ D(X) ] - \mathbb{E}[ D(G(Z)) ]
where we \mathscr{D} is space of 1-Lipschitz functions. Either use gradient clipping or penalise gradients far from 1:
\max_{D} \mathbb{E}[ D(X) ] - \mathbb{E}[ D(G(Z)) ] + \lambda \mathbb{E} \Bigl[ ( \bigl|\bigl| \nabla D \bigr|\bigr| - 1)^2 \Bigr] .
Schematic
Links
- Dongyu Liu (2021), TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks
- Jeff Heaton (2022), GANs for Tabular Synthetic Data Generation (7.5)
- Jeff Heaton (2022), GANs to Enhance Old Photographs Deoldify (7.4)