ST311-st311代写-Assignment 4
时间:2023-03-30
ST 311 Assignment 4 (theory part)
Due by 5pm, 4 April, 2023
Candidate number:
Instruction: Attempt all questions. The total marks is 50.
1. Use pseudo codes to summarize the minibatch stochastic gradient descent algorithms
of generative adversarial networks (GAN) and Wasserstein GAN. Discuss the main
difference between two algorithms. [24 marks]
2. Recall that the Bellman equation for the Markov reward process (MRP) yields
V (s) = E(Gt|St = s) = E[Rt+1 + γGt+1|St = s] = E[Rt+1 + γV (St+1)|St = s].
(a) Suppose X, Y, Z are random variables. Please show that
E[E[X|Y, Z]|Y = y] = E[X|Y = y]. (1)
[13 marks]
(b) With the help of (1), please prove that the Bellman equation holds for MRPs.
Hint: you may also need to use the Markov property of MRPs, i.e. ‘the future is
independent of the past given the present.’ [13 marks]
essay、essay代写