ACTL3143 & ACTL5111 Deep Learning for Actuaries
Lecture Outline
Traditional GANs
Training GANs
Conditional GANs
Image-to-image translation
Problems with GANs
Wasserstein GAN
An autoencoder takes a data/image, maps it to a latent space via en encoder module, then decodes it back to an output with the same dimensions via a decoder module.
Source: Marcus Lautier (2022).
Try out https://www.whichfaceisreal.com.
Source: https://thispersondoesnotexist.com.
Source: Jeff Heaton (2021), Training a GAN from your Own Images: StyleGAN2.
Source: Thales Silva (2018), An intuitive introduction to Generative Adversarial Networks (GANs), freeCodeCamp.
Source: Google Developers, Overview of GAN Structure, Google Machine Learning Education.
How they best each other:
lrelu = layers.LeakyReLU(alpha=0.2)
discriminator = keras.Sequential([
keras.Input(shape=(28, 28, 1)),
layers.Conv2D(64, 3, strides=2, padding="same", activation=lrelu),
layers.Conv2D(128, 3, strides=2, padding="same", activation=lrelu),
layers.GlobalMaxPooling2D(),
layers.Dense(1)])
discriminator.summary()
latent_dim = 128
generator = keras.Sequential([
layers.Dense(7 * 7 * 128, input_dim=latent_dim, activation=lrelu),
layers.Reshape((7, 7, 128)),
layers.Conv2DTranspose(128, 4, strides=2, padding="same", activation=lrelu),
layers.Conv2DTranspose(128, 4, strides=2, padding="same", activation=lrelu),
layers.Conv2D(1, 7, padding="same", activation="sigmoid")])
generator.summary()
Lecture Outline
Traditional GANs
Training GANs
Conditional GANs
Image-to-image translation
Problems with GANs
Wasserstein GAN
First step: Training discriminator:
Second step: Training generator:
# Separate optimisers for discriminator and generator.
d_optimizer = keras.optimizers.Adam(learning_rate=0.0003)
g_optimizer = keras.optimizers.Adam(learning_rate=0.0004)
# Instantiate a loss function.
loss_fn = keras.losses.BinaryCrossentropy(from_logits=True)
@tf.function
def train_step(real_images):
# Sample random points in the latent space
random_latent_vectors = tf.random.normal(shape=(batch_size, latent_dim))
# Decode them to fake images
generated_images = generator(random_latent_vectors)
# Combine them with real images
combined_images = tf.concat([generated_images, real_images], axis=0)
# Assemble labels discriminating real from fake images
labels = tf.concat([
tf.zeros((batch_size, 1)),
tf.ones((real_images.shape[0], 1))], axis=0)
# Add random noise to the labels - important trick!
labels += 0.05 * tf.random.uniform(labels.shape)
# Train the discriminator
with tf.GradientTape() as tape:
predictions = discriminator(combined_images)
d_loss = loss_fn(labels, predictions)
grads = tape.gradient(d_loss, discriminator.trainable_weights)
d_optimizer.apply_gradients(zip(grads, discriminator.trainable_weights))
# Sample random points in the latent space
random_latent_vectors = tf.random.normal(shape=(batch_size, latent_dim))
# Assemble labels that say "all real images"
misleading_labels = tf.ones((batch_size, 1))
# Train the generator (note that we should *not* update the weights
# of the discriminator)!
with tf.GradientTape() as tape:
predictions = discriminator(generator(random_latent_vectors))
g_loss = loss_fn(misleading_labels, predictions)
grads = tape.gradient(g_loss, generator.trainable_weights)
g_optimizer.apply_gradients(zip(grads, generator.trainable_weights))
return d_loss, g_loss, generated_images
# Prepare the dataset.
# We use both the training & test MNIST digits.
batch_size = 64
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
all_digits = np.concatenate([x_train, x_test])
all_digits = all_digits.astype("float32") / 255.0
all_digits = np.reshape(all_digits, (-1, 28, 28, 1))
dataset = tf.data.Dataset.from_tensor_slices(all_digits)
dataset = dataset.shuffle(buffer_size=1024).batch(batch_size)
# In practice you need at least 20 epochs to generate nice digits.
epochs = 1
save_dir = "./"
%%time
for epoch in range(epochs):
for step, real_images in enumerate(dataset):
# Train the discriminator & generator on one batch of real images.
d_loss, g_loss, generated_images = train_step(real_images)
# Logging.
if step % 200 == 0:
# Print metrics
print(f"Discriminator loss at step {step}: {d_loss:.2f}")
print(f"Adversarial loss at step {step}: {g_loss:.2f}")
break # Remove this if really training the GAN
Lecture Outline
Traditional GANs
Training GANs
Conditional GANs
Image-to-image translation
Problems with GANs
Wasserstein GAN
Source: Sharon Zhou, Conditional Generation: Intuition Build Basic Generative Adversarial Networks (Week 4), DeepLearning.AI on Coursera.
Source: Sharon Zhou, Conditional Generation: Intuition Build Basic Generative Adversarial Networks (Week 4), DeepLearning.AI on Coursera.
Lecture Outline
Traditional GANs
Training GANs
Conditional GANs
Image-to-image translation
Problems with GANs
Wasserstein GAN
Source: Deoldify package.
Source: Deoldify package.
Lecture Outline
Traditional GANs
Training GANs
Conditional GANs
Image-to-image translation
Problems with GANs
Wasserstein GAN
StyleGAN2-ADA training times on V100s (1024x1024):
GPUs | 1000 kimg | 25000 kimg | sec / kimg | GPU mem | CPU mem |
---|---|---|---|---|---|
1 | 1d 20h | 46d 03h | 158 | 8.1 GB | 5.3 GB |
2 | 23h 09m | 24d 02h | 83 | 8.6 GB | 11.9 GB |
4 | 11h 36m | 12d 02h | 40 | 8.4 GB | 21.9 GB |
8 | 5h 54m | 6d 03h | 20 | 8.3 GB | 44.7 GB |
Source: NVIDIA’s Github, StyleGAN2-ADA — Official PyTorch implementation.
Converges to a Nash equilibrium.. if at all.
Source: Lilian Weng (2019), From GAN to WGAN, ArXiV.
Source: Metz et al. (2017), Unrolled Generative Adversarial Networks and Randall Munroe (2007), xkcd #221: Random Number.
# Separate optimisers for discriminator and generator.
d_optimizer = keras.optimizers.Adam(learning_rate=0.0003)
g_optimizer = keras.optimizers.Adam(learning_rate=0.0004)
Source: Thales Silva (2018), An intuitive introduction to Generative Adversarial Networks (GANs), freeCodeCamp.
Conv2D
GlobalMaxPool2D
Conv2DTranspose
Sources: Pröve (2017), An Introduction to different Types of Convolutions in Deep Learning, and Peltarion Knowledge Center, Global max pooling 2D.
Source: Sharon Zhou, Problem with BCE Loss, Build Basic Generative Adversarial Networks (Week 3), DeepLearning.AI on Coursera.
Source: Lilian Weng (2019), From GAN to WGAN, ArXiV.
Lecture Outline
Traditional GANs
Training GANs
Conditional GANs
Image-to-image translation
Problems with GANs
Wasserstein GAN
Trying to minimise the distance between the distribution of generated samples and the distribution of real data.
Vanilla GAN is equivalent to minimising the Jensen–Shannon Divergence between the two.
An alternative distance between distributions is the Wasserstein distance.
Critic D : \text{Input} \to \mathbb{R} how “authentic” the input looks. It can’t discriminate real from fake exactly.
Critic’s goal is
\max_{D \in \mathscr{D}} \mathbb{E}[ D(X) ] - \mathbb{E}[ D(G(Z)) ]
where we \mathscr{D} is space of 1-Lipschitz functions. Either use gradient clipping or penalise gradients far from 1:
\max_{D} \mathbb{E}[ D(X) ] - \mathbb{E}[ D(G(Z)) ] + \lambda \mathbb{E} \Bigl[ ( \bigl|\bigl| \nabla D \bigr|\bigr| - 1)^2 \Bigr] .
Source: Côté et al. (2020), Synthesizing Property & Casualty Ratemaking Datasets using Generative Adversarial Networks, Working Paper?.