程序代写案例-COMP5623M
时间:2022-03-23
© The University of Leeds 2021
Generative Models
Artificial Intelligence, COMP5623M
2021-22
Introduction
Aim to build a generator of images from some domain
(e.g., outdoor scenes, faces) based only on:
A training set of images from the domain.
All possible images
Deep CNN
Random vector
Outdoor
scenes
2
Input vectors
Input vectors are samples drawn from some distribution over ℝ௡, where is typically
between 100 and 512.
The vectors are not observed (i.e., they are not specified in our dataset) and are
therefore referred to as latent.
In recent work on image generation, the distribution decomposes so that the
components of each sample are drawn independently from the same 1-D probability
distribution. Two common choices for this distribution:
~ 0,1 normal (Gaussian) distribution with mean 0 and standard deviation 1
~ −1,1 uniform distribution between -1 and +1
-1 +1
Example - DCGAN
Radford et al., Unsupervised representation learning with deep
convolutional generative adversarial networks. ICLR, 2016
In experiments, the architecture of DCGAN varies according
to the target size determined by the source dataset.
Here is the architecture of DCGAN used for training on the
MNIST dataset, where the target size is 28x28x1.
Fcl, 100, 7x7x256
5x5, conv, 128
5x5, conv, 1, frac-stride ଵ ଶ⁄
Batch normalisation + Leaky Relu
5x5, conv, 64, frac-stride ଵ ଶ⁄
Batch normalisation + Leaky Relu
Batch normalisation + Leaky ReLU
∈ ℝଵ଴଴, each component ௜~(0,1)
7x7x256
7x7x128
14x14x64
28x28x1
Output vector
reshaped to 7x7x256
7x7x256
7x7x128
14x14x64
Fractionally-strided convolution
Convolution (padded), with a stride of n, makes output n times
smaller than input.
Convolution with a fractional-stride of ଵ

, makes the output n
times larger by simply inserting (n-1) rows and columns of zeros
between the rows and columns of the input. 3x3 convolution with
fractional stride of ଵ

3x3 convolution with
stride of 2, zero
padding
Leaky ReLU
ReLU Leaky ReLU
Close to ReLU but avoids backpropagation of zero gradients
Constructing a loss function for the generator
There is no paired data of the form ௜, ௜ , allowing supervised learning.
All we have is a collection of images from some domain ௜ . How to define a loss?
Solution: use a CNN classifier (discriminator) to produce a loss for training the generator.
The discriminator outputs a probability () that a given image is real.
Thus, 1 − () is the probability that is fake.
Then use cross-entropy with target ‘real’ for all fake images.
Updating the discriminator
Update the discriminator parameters ௗ from a minibatch of:
• real images from the chosen domain { ଵ , ⋯ , ௠ }
• fake images from the generator {( ଵ ), ⋯ , ௠ } obtained from random latent vectors { ଵ , ⋯ , ௠ } .
Log likelihood to be maximised:
1

෍ log ( ௜ + log 1 − ௜

௜ୀଵ
Standard supervised setting with cross-entropy loss on two-class (real/fake) output distribution.
DCGAN discriminator
5x5 conv, 64, stride 2
5x5, conv, 128, stride 2
LeakyRelu
fcl, *, 1
LeakyRelu
Probability input is a real image
Updating the generator
Random
vector Image
Probability
real Cross-entropy loss
Generator Discriminator
1

෍ log ௜


Log likelihood to be maximised:
probability a fake image is real
discriminator parameter values remain fixed
Linked training
The key insight is to train the generator and discriminator iteratively.
Thus, the generator and discriminator improve together, minibatch by minibatch.
Initialise the generator and discriminator networks randomly from a normal distribution
Repeat num_epoch times
Repeat on each batch of real images
Sample from latent distribution and use generator to produce a batch of fake images
Update discriminator to increase the log likelihood on the output from real and fake
images (cross-entropy loss)
Pass fake images through the current discriminator to generate output probabilities
Backpropagate gradient through discriminator and generator, and update generator
to increase the log likelihood of 'real' for the fake images.
Adversarial setting
Generator and Discriminator can be viewed as adversaries in a zero-sum game. Hence Generative Adversarial Network.
Payoff to the discriminator given by:
௚, ௗ = ௫~௣೏ೌ೟ೌ log + ௫~௣೘೚೏೐೗log (1 − )
Where ௚ and ௗ are the parameter values of generator and discriminator.
estimated as average of log (() for a batch of real images
estimated as average of log (1 − ) for a batch of fake images
Payoff to the generator is 1 − ௚, ௗ .
Making alternate updates to parameter values, discriminator and generator seek to maximise their payoff.
The game continues until we get an optimal generator ∗ that has the smallest of the maximum payoffs achieved
by the discriminator:
∗ = arg ௚ ௗ (, )
In practice, updating the generator to increase 1 − ௚, ௗ doesn’t work as well as seeking to increase:
௫~௣೘೚೏೐೗ log
Experiments with DCGAN
Fake images from a generator trained on 3 million images of bedroomsTraining on MNIST over 50 epochs
From https://www.tensorflow.org/tutorials/generative/dcgan
Tours of the output space
Sample nine random input vectors, move between
these, generating fake images at equal steps.
BigGAN-deep Generative model conditioned on a class label
Embedding of class label (size 128)
∈ ℝଵଶ଼ each component sampled from
normal (Gaussian) distribution
Linear mapping, reshaped into 4x4x16
Residual blocks with multiple convolutional
layers, upsampling and batch normalisation.Goes beyond regular
fully-connected layer
Brock et al., Large Scale GAN Training for High
Fidelity Natural Image Synthesis, ICLR 2019 256 × 256
128128
256
Some generated images Training on ImageNet256x256 resolution generated images


essay、essay代写