xuebaunion@vip.163.com
3551 Trousdale Rkwy, University Park, Los Angeles, CA
留学生论文指导和课程辅导
无忧GPA:https://www.essaygpa.com
工作时间:全年无休-早上8点到凌晨3点

微信客服:xiaoxionga100

微信客服:ITCS521
FIT3154 Assignment 2 Due Date: 11:59PM, Sunday, 17/10/2021 Introduction There are a total of two questions worth 15 + 19 = 34 marks in this assignment. This assignment is worth a total of 20% of your final mark, subject to hurdles and any other matters (e.g., late penalties, special consideration, etc.) as specified in the FIT3154 Unit Guide or elsewhere in the FIT3154 Moodle site (including Faculty of I.T. and Monash University policies). Students are reminded of the Academic Integrity Awareness Training Tutorial Activity and, in par- ticular, of Monash University’s policies on academic integrity. In submitting this assignment, you acknowledge your awareness of Monash University’s policies on academic integrity and that work is done and submitted in accordance with these policies. Submission: No files are to be submitted via e-mail. Correct files are to be submitted to Moodle, as given above. You must the following files: 1. One PDF file containing non-code answers to all the questions that require written answers. This file should also include all your plots, and must be clearly written and formatted. 2. The required R file containing your R answers. Please read these submission instructions carefully and take care to submit the correct files in the correct places. 1 Question 1 (15 marks) In this question we will again examine Bayesian inference of the geometric distribution, but this time using a different prior distribution. Recall that the geometric distribution is a distribution used to model counts, and has the probability mass function p(y |µ) = µy(1 + µ)−y−1, y ∈ {0, 1, 2, . . .} (1) where µ > 0 is the mean parameter of the distribution. In Assignment 1 we used a beta-prime prior distribution as this resulted in a simple posterior. This time we will use a special heavy-tailed prior distribution which we will call the squared half-Cauchy; this has the probability density pi(µ | s) = 1√ spi √ µ (1 + µ/s) , (2) where s > 0 is a scale-parameter that can be used to encode our prior information. If a RV X follows a squared half-Cauchy with scale s, we say X ∼ C2+(0, s). Our hierarchy is then yi |µ ∼ Geo(µ), i = 1, . . . , n (3) µ | s ∼ C2+(0, s) (4) where s can be thought of as setting the value of the “best guess” of µ before we see the data. For reference, the analysis in Assignment 1 used a beta-prime prior distribution with hyperparameters a ≈ 80.4 and b ≈ 6.74, which encoded a prior best guess for µ of 14. The posterior mean was E [µ |y] ≈ 15.94 and the 95% credible interval was ≈ (14.416, 17.616). Download the file geo.shc.csv. This contains the output of a Stan implementation of the hierarchy (3)-(4), run with s = 14. A total of 20, 000 samples of µ drawn from the posterior distribution is provided. 1. Plot the histogram of the samples, and report on the posterior mean and the 95% credible intervals; compare these to the posterior mean and intervals found using the beta-prime prior from Assignment 1 (summarised above). [3 marks] We will now look at the maximum a posteriori (MAP) estimator, which is found by maximising the posterior, or equivalently minimising the negative log-posterior, i.e,. µˆMAP = argmin µ {− log p(y |µ)pi(µ|s)} (5) Please answer the following questions: 2. Write down the formula for the negative log-posterior, i.e., − log p(y |µ)pi(µ|s), up to constants not dependent on µ. Make sure to simplify the expression. [2 marks] 3. Prove that maximising the posterior is equivalent to solving (2n+ 3)µ2 + (s(2n+ 1)− 2Y + 3)µ+ s(1− 2Y ) = 0 (6) where Y = ∑n i=1 yi. [2 marks] 4. Download the function geo.map.R. This contains the function geo.map(y,s) which finds the MAP estimator by solving equation (6). Using this function and the data in covid.2021.csv that we analysed in Assignment 1, compute the MAP estimator for s = 14 (our prior best guess based on SARS recovery). Compare this estimate to the posterior mean using the squared half-Cauchy found in Question 1.1. Discuss why you think they are similar/different. [1 mark] 2 5. Using function geo.map(y,s), explore the sensitivity of the MAP estimator for our Covid-19 data when using choices of prior best guess that are quite different from our guess based on the SARS recovery time. In particular, try the values s = { 10−4, 10−3, . . . , 1, . . . , 106, 107 } . Plot the MAP estimates using these values of s against the values of s. How do these estimates compare to using s = 14? What does this say about the squared half-Cauchy prior? [2 marks] 6. Find the (asymptotic) formula for the MAP estimate of µ, i.e., the solution to (6) when (i) s→ 0, and (ii) s→∞. Comment on these. [2 marks] As discussed in Assignment 1, occasionally the alternative parameterisation v = logµ is used for the geometric distribution. 7. Transform the squared half-Cauchy prior (2) on µ to a prior on v using the transformation of random variables technique from Lecture 5. [2 marks] 8. Plot this prior distribution for the choice of hyperparameters s = 10−1, s = 1 and s = 10. How does s affect the prior on v? [1 mark] 3 Question 2: Geometric Regression (19 marks) In this question we are going to examine an extension to the geometric distribution to a regression setting. Let y = (y1, . . . , yn) be n non-negative integer observations that we would like to model, and let xi = (xi,1, . . . , xi,p) be a vector of p predictors associated with the i-th data point. Geometric regression models extend the usual geometric distribution in the following way: yi |xi,1, . . . , xi,p ∼ Geo(µi) (7) µi = exp β0 + p∑ j=1 βjxi,j where β = (β1, . . . , βp) are the coefficients relating the predictors to the target, β0 is the intercept and the geometric distribution has probability mass function given by (1). In this way the geometric regression model extends the regular linear regression model to modelling non-negative integer data. The effects of the predictors are additive on the log scale, and therefore multiplicative on the original scale (note: this is very similar to the way the predictors affect the target for the Poisson regression model discussed in Lecture 8). Part I The first part of this question will examine using the geometric regression model. Download the file diabetes.ass2.2021.csv. This contains data regarding the diabetes progression on n = 442 individuals. The target, Y, is an integer measure of diabetes progression, with a higher score indicating a worse progression. 1. Use the bayesreg() function to fit a geometric regression to the diabetes data, with Y as the target, using Bayesian ridge regression. Use at least 10, 000 burn-in samples and draw at least 20, 000 samples (hint: use ?bayesreg to figure out the appropriate options to do this). Use summary() to inspect the fitted model. From this model: (a) which variables do you think are associated with disease progression, and why? Which appears to be the strongest predictor of disease progression, and why? [2 marks] (b) how do sex and blood pressure (bp) affect diabetes progression? (note: sex is coded as 0=male, 1=female and bp is measured in millimeters of mercury (mmHg)) [2 marks] (c) provide a 95% credible interval for the effect of sex on diabetes progression. [1 mark] 2. In Studio 8, when we used an additive model and treated the targets as Gaussian, we found that S5 had a non-linear relationship with Y. Refit the model including the square-root of S5 as an additional predictor; from the output of bayesreg(), justify whether or not this new predictor appears to be associated with Y? [1 mark] Part II There is no simple closed-form formula for the maximum likelihood estimates for a geometric regression – in fact, this type of model is not even implemented in the standard R command glm(). The second part of this question then will require you to write code to fit a geometric regression using maximum likelihood and gradient descent. To get you started I have provided a script called georeg.gd.R which contains three functions: georeg.gd(), df2matrix() and my.standardise(). 4 The main function georeg.gd() is a skeleton that takes a y and X, standardises the predictors (as per Lecture 3) using my.standardise() and then has space for you to fill in with your gradient descent code. Once your code has finished running, the function unstandardises the coefficients back to the original scale. The function df2matrix() takes a data frame and a formula and returns an appropriate matrix of predictors X and vector of targets y. Answer the following questions: 3. Write down the expression for the negative log-likelihood of data y = (y1, . . . , yn) under the geometric regression model (7) with predictor matrix X of size n × p, coefficient vector β and intercept β0. [2 marks] 4. Find the expressions for the gradient of the negative log-likelihood with respect to the intercept β0 and the coefficients β1, . . . , βp. [4 marks] 5. Write an R function that takes a vector of data y, a matrix of predictors, X, an intercept beta0 and a vector of coefficients beta and returns: (i) the negative log-likelihood and (ii) the gradients with respect to β0 and the coefficients β1, . . . , βp [2 marks] 6. Implement gradient descent to find the maximum likelihood estimates of the intercept β0 and coefficients β1, . . . , βp. Use the function georeg.gd() as basis to get started. The function takes as arguments • An X matrix of predictors and y vector of targets • An initial value for the learning rate κ. • The size of the largest absolute value of the gradient used for termination (max.grad). The function does some initialisation of the estimates of β0 and β for you by using least-squares onto the log-transformed targets. This are good starting points for a search. You will need to write the code to find the maximum likelihood estimates βˆ0 and βˆ1, . . . , βˆp for the X and y that were passed using the gradient descent algorithm. Once it has finished, it should return a list containing: • The ML estimate of the intercept β0. • The ML estimates of the coefficients β1, . . . , βp • The negative log-likelihood at your final estimates • The gradient at your final estimates • The negative log-likelihood recorded at every iteration of your gradient descent loop. Some further details • Your learning rate, κ, must be adaptive. • Your algorithm should terminate when the largest absolute value of the gradients is less than the max.grad argument passed to your function. • When writing your code, you should consider computational efficiency and ensure that it is cleanly written and commented. [4 marks] 7. Using the gradient descent code you have written, fit a geometric regression to the diabetes data you analysed previously. You should compare your estimates to those obtained using the bayesreg() function. [1 mark] 5 At the end of this assignment, you should reflect on how much you have learned since you started this unit. You now have learned how to use the Bayesian approach to statistical inference, you have ideas on how to analyse non-Gaussian data using regression models, and have even managed to write a program to fit such a model using maximum likelihood! 6