UNIVERSITY OF BRISTOL Examination for the Degrees of B.Sc. and M.Sci. (Level C/4) STATISTICS MATH-10013/11400 (Paper Code MATH-10013) May/June 2019, 1 hour and 30 minutes This paper contains two sections: Section A and Section B. Each section should be answered in a separate answer book. Section A contains five questions, ALL of which will be used for assessment. This section is worth 40% of the marks for the paper. Section B contains two questions, ALL of which will be used for assessment. This section is worth 60% of the marks for the paper. On this examination, the marking scheme is indicative and is intended only as a guide to the relative weighting of the questions. Calculators of an approved type (non-programmable, no text facility) are allowed in this examination. THIS PAPER MUST NOT BE REMOVED FROM THE EXAMINATION ROOM. 1 of 9 Do not turn over until instructed. Cont... Math 10013–MayJune19 A1. (8 marks) Consider the test scores of 9 students: scores <- c(78, 93, 68, 84, 90, 74, 64, 55, 80) (a) Define the median and the lower and upper hinges H1 and H3 of a data set of size 2m+ 1. (b) Give a five number summary of the test scores above. (c) Are there outliers, in the sense that there are points more extreme than 1.5(H3H1) away from the hinges? Is the data set skewed to the left or right? Briefly explain why. A2. (8 marks) We are given observations x1, x2, . . . , xn from a specified distribution with un- known parameter ✓. Consider a test of the null hypothesis H0 : ✓ = ✓0 against the alterna- tive H1 : ✓ = ✓1. To fix ideas, assume that we have designed a test statistic T (x1, x2, . . . , xn) such that large values of T (x1, x2, . . . , xn) indicate inconsistency of the observations with H0. Define the following terms precisely: (a) Critical region and critical value. (b) Type I error and type II error. (c) The significance level and the power of the test. (d) pvalue. A3. (8 marks) Let X1, X2, . . . , X36 be a random sample of size 36 from a distribution corre- sponding to a continuous random variable X of mean E(X) = 2 and variance var(X) = 4. (a) State the normal approximation to the distribution of X given by the central limit theorem in terms of the cumulative distribution function (·) or a N(0, 1) random variable. (b) Use this approximation to calculate P (X1 +X2 + · · ·+Xn 1.7). You should explain your reasoning carefully and may use the following R code output: pnorm(c(-5.86,5.84,-5.82,-5.78)) ## [1] 2.314336e-09 1.000000e+00 2.942381e-09 3.735031e-09 (c) Now assume X1, . . . , X36 to be a random sample from a distribution corresponding to an integer valued random variable X of mean E(X) = 2 and variance var(X) = 4. Find an improved approximation of P (X1 +X2 + · · ·+Xn 1.7). You may use the R code output above. A4. (8 marks) Let X1, . . . , Xn be a random sample of size n from the N(µ, 2) distribution. Let X = Pn i=1Xi/n and S 2 = Pn i=1(Xi X)2/(n 1). (a) State the distribution of X. (b) State the precise distributions of Pn i=1(Xi µ)2/2 and Pn i=1(Xi X)2/2. 2 of 9 Continued over... Cont... Math 10013–MayJune19 (c) Using the definition of a tdistribution with r degrees of freedom and the results above, establish the distribution of p n X µ/S. You may assume any facts about the distribution of X and Pn i=1(XiX)2, provided you state them correctly. A5. (8 marks) Let x1, x2, . . . , xn be the observed values of a simple random sample from the Poisson distribution of parameter ✓, that is if X ⇠ Poisson(✓), then PX = x; ✓ = ✓x exp(✓)/x! for x = 0, 1, 2, . . . (and zero otherwise). (a) Find the maximum likelihood estimator of ✓. (b) Let ⌧(✓) = P(X = 0; ✓) = exp(✓). Find a maximum likelihood estimate of ⌧(✓). 3 of 9 Continued over... Cont... Math 10013–MayJune19 Remember to start a new answer book for Section B. B1. Let x1, x2, . . . , xn be the observed values of X1, X2, . . . , Xn for n 2 N, assumed to be independent and identically distributed according to a Pareto distribution with unknown parameters ↵ > 1 and > 0 and probability density function f(x;↵, ) = ( ↵↵ x↵+1 if x 0 otherwise. It can be shown that the mean of this distribution is ↵/(↵ 1). (a) (4 marks) We assume first that = 1 is known. Derive the method of moments estimate ↵ˆmom of ↵. (b) Still assuming = 1 known: i. (4 marks) Derive the likelihood equation for this distribution, assuming xi 1 for i = 1, . . . , n. You should explain every step of your reasoning. ii. (2 marks) Find the maximum likelihood estimator ↵ˆmle of ↵. You should explic- itly confirm that this is a maximum. (c) (3 marks) For either estimator ↵ˆmle or ↵ˆmom (call it ↵ˆ(X1, X2, . . . , Xn) for simplicity): i. what is the definition of the sampling distribution of ↵ˆ(X1, X2, . . . , Xn)? ii. define the bias and mean squared error (mse) of ↵ˆ(X1, X2, . . . , Xn). (d) Given the expressions for ↵ˆmom and ↵ˆmle it seems unlikely that we can get expressions for the bias and mean squared error and we turn to a numerical simulation. true.alpha<-2 xvalues <- rpareto(10000, 1, shape=true.alpha) xsamples <- matrix(xvalues, nrow=1000) log.xsamples <- log(xsamples) mean.xsamples <- apply(xsamples, 1, mean) mean.log.xsamples <- apply(log.xsamples, 1, mean) alpha.mom <- mean.xsamples/(mean.xsamples-1) alpha.mle <- 1/mean.log.xsamples boxplot(alpha.mom,alpha.mle, names= c("mom","mle")) abline(h=true.alpha, lty=2) 4 of 9 Continued over... Cont... Math 10013–MayJune19 ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● mom mle 1 2 3 4 5 6 7 i. (5 marks) Explain briefly in words what each line of the R code above is doing. ii. (3 marks) In the light of the graph above, which estimator seems to perform best? Do you think estimating ↵ reliably is easy? You should justify your answer. iii. (2 marks) How would you estimate the bias and mean squared error of the estimators ↵ˆmom and ↵ˆmle when ↵ = 2? You can give your answer in the form of pseudo-code or simple R code. (e) Now we consider the situation where ↵ > 1 is known, but > 0 is unknown and to be estimated. i. (2 marks) Find the method of moments estimate of . ii. (3 marks) Find the maximum likelihood estimate of with xi > 0 for i = 1, . . . , n. You may use a graphical argument to explain your reasoning. iii. (2 marks) Find an expression for the sampling distribution of ˆmle in terms of the cdf of the Pareto distribution F (x;↵, ) = 1 (/x)↵ for x and zero otherwise. You may start your reasoning by considering the survival function P(ˆmle > x;↵, ). 5 of 9 Continued over... Cont... Math 10013–MayJune19 B2. Linear regression (a) Modelling assumptions and estimation i. (3 marks) In a standard linear regression model Yi = ↵+xi+ ei, what assump- tions do we make about the residuals {e1, . . . , en}? ii. (2 marks) What is meant by least squares estimates b↵ and b? iii. (2 marks) Define ssxx and ssxy as used in the least squares estimate ˆ = ssxy/ssxx. Give an expression for the least squares estimate ↵ˆ in terms of ˆ and sample means x and y. (b) Confidence interval and model testing i. (3 marks) Define precisely what a 95% confidence interval for ↵ is? How would you explain this informally to a non-mathematical person? ii. (2 marks) What additional assumption does one usually make about the resid- uals {e1, . . . , en} in order to derive a confidence interval for ↵? Why are the assumptions for a standard linear regression model not sucient? iii. (2 marks) Define s2b↵ = b21/n+ x2/ssxx, where b2 = 1n 2 nX i=1 (Yi b↵ bxi)2. Assuming that with your modelling assumptions you can establish that (b↵ ↵)/sb↵ ⇠ tn2, find a 95% confidence interval for ↵. You should explain your reasoning carefully. (c) The cars data set contains the stopping distances (in feet) for di↵erent car speeds (in miles per hour) recorded in the 1920s. Here are the first 6 records: head(cars) ## speed dist ## 1 4 2 ## 2 4 10 ## 3 7 4 ## 4 7 22 ## 5 8 16 ## 6 9 10 i. (2 marks) We type the following R commands: attach(cars) plot(speed,dist, xlab='speed (mph)',ylab='stopping distance (ft)') 6 of 9 Continued over... Cont... Math 10013–MayJune19 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 25 0 20 40 60 80 12 0 speed (mph) sto pp ing d ist an ce (f t) What is the name for this type of plot? What does it suggest? ii. (2 marks) Explain what the following R code does. What can you deduce from this plot? fit<-lm(dist~speed) plot(speed,dist) abline(fit) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 25 0 20 40 60 80 12 0 speed dis t 7 of 9 Continued over... Cont... Math 10013–MayJune19 iii. (2 marks) Explain how to estimate the residuals {e1, . . . , en}? iv. (3 marks) Explain what the following R code does and what the plot tells us? Explain why this is important? e.hat<-residuals(fit) qqnorm(e.hat/sd(e.hat)) abline(0,1) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● −2 −1 0 1 2 −2 −1 0 1 2 3 Normal Q−Q Plot Theoretical Quantiles Sa m ple Q ua nt ile s v. (4 marks) With b2 as defined earlier, let s2b = b2 ssxx , and T = (b )/sb. How would you use T to test H0 : = 0 vs. H1 : < 0 if you observed a realisation tobs of T? You should not forget to explain why using T may be a good idea and state precisely the distribution of the random variables you may refer to. vi. (3 marks) We are still interested in testing H0 : = 0 vs. H1 : < 0 and type the following command: summary(fit) ## ## Call: ## lm(formula = dist ~ speed) ## ## Residuals: ## Min 1Q Median 3Q Max ## -29.069 -9.525 -2.272 9.215 43.201 8 of 9 Continued over... Cont... Math 10013–MayJune19 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -17.5791 6.7584 -2.601 0.0123 * ## speed 3.9324 0.4155 9.464 1.49e-12 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 15.38 on 48 degrees of freedom ## Multiple R-squared: 0.6511,Adjusted R-squared: 0.6438 ## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12 How would you deduce the pvalue for this test from the output above? What is your conclusion? You should explain your reasoning carefully and in particular distinguish between the generic scenarios tobs 0 and tobs < 0. 9 of 9 End of examination.
学霸联盟