University of California, Los Angeles Department of Statistics Statistics 100C Instructor: Nicolas Christou Homework 8 Exercise 1 Answer the following questions: a. Consider the multiple regression model Y = Xβ + , subject to a set of m linear restrictions of the form Cβ = γ with γ 6= 0. The matrix C is m × (k + 1) and assume that the last m columns of C are linearly independent so that C can be partitioned in (C1,C2), where C2 is nonsingular. (The columns of C2 are the last m columns of C). Transform the model into Yr = Xrβ1 + and explain how to estimate β. b. Consider the constrained least squares problem. If we denote with Yˆ the unconstrained fitted values and with Yˆc the constrained fitted values show that (Y − Yˆc)′(Y − Yˆc) = (Y − Yˆ)′(Y − Yˆ) + (Yˆ − Yˆc)′(Yˆ − Yˆc). Exercise 2 Answer the following questions: a. For the constrained least squares problem let λ be the Lagrange multiplier. Show that SSEc − SSE = σ2λ′ [var(λ)]−1 λ, where SSEc is the constrained error sum of squares and SSE is the error sum of squares when there are no constraints. b. Consider the usual multiple regression model. Show that E(βˆ ′ βˆ) = β′β+σ2 ∑k+1 i=1 1 λi , where βˆ is the ordinary least squares estimator of β and λi, i = 1, . . . , k + 1 are the eigenvalues of X ′X. c. Consider a multiple regression problem with k = 5 predictors. The Gauss-Markov conditions hold and also i ∼ N(0, σ). Find the distribution and the pdf of Q = ( βˆ1 − 2βˆ2 + 3βˆ4 βˆ0 + βˆ4 + 3βˆ5 ) . Exercise 3 Consider the multiple regression model Y = Xβ + . Let X = [X1,X2]. Assume there is no intercept in the model. Regress Y on X1 and obtain the residuals Y ∗ from this regression. Then regress each column of X2 on X1 to get a matrix of residuals X∗2. Finally regress the residuals from the first regression on the residuals from the second regression to obtain the partial coefficient βˆ2.1. Show that Y ∗ −X∗2βˆ2.1 is orthogonal to X. Exercise 4 Answer the following questions: a. Let Y1 = α1 + 1, Y2 = 2α1 − α2 + 2, Y3 = α1 + 2α2 + 3, where ∼ N3(0, σ2I). Show that the F test for testig H0 : α1 = α2 is given by F = (αˆ1−αˆ2) 2 11 30 S2e . b. Let Y1 = θ1 + θ2 + 1, Y2 = 2θ2 + 2, Y3 = −θ1 + θ2 + 3, where ∼ N3(0, σ2I). Explain how you would use the canonical form to derive the F statistic for testing H0 : θ1 = 2θ2. 1 Exercise 5 Access the following data: a <- read.table("http://www.stat.ucla.edu/~nchristo/statistics_c173_c273/jura.txt", header=TRUE) These Jura data were collected by the Swiss Federal Institute of Technology at Lausanne. See Goovaerts, P. 1997, “Geostatistics for Natural Resources Evaluation”, Oxford University Press, New-York, 483 p. for more details. Data were recorded at 359 locations scattered in space (see figure below). l l l l l l l l l l l l l l l l l l l l l l l l ll ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 1 2 3 4 5 1 2 3 4 5 The Jura data set x y Concentrations of seven heavy metals (cadmium, cobalt, chromium, copper, nickel, lead, and zinc) in the topsoil were measured at each location. The type of land use and rock type was also recorded for each location. > names(a) [1] "x" "y" "Landuse" "Rock" "Cd" [6] "Co" "Cr" "Cu" "Ni" "Pb" [11] "Zn" y <- a$Pb x1 <- a$Cd x2 <- a$Co x3 <- a$Cr x4 <- a$Cu x5 <- a$Ni x6 <- a$Zn The variables x, y are the coordinates. Landuse and Rock represent type of land use (forest, pasture, meadow, tillage) and rock type (Argovian, Kimmeridgian, Sequanina, Portlandian, and Quaternary). The other variables are concerntrations in ppm of the following chemical elements: Cd: Cadmium, Co: Cobalt, Cr: Chromium, Cu: Copper, Ni: Nickel, Pb: Lead, Zn: Zinc. Answer the following questions: a. Consider the full model (regression of y on x1, x2, x3, x4, x5, x6). Construct the X matrix. b. Compute βˆ, s2e, and H of the full model. c. Test the overall significance of the model using the F test for the general linear hypothesis: F = (Cβˆ − γ)′ [ C(X′X)−1C′ ]−1 (Cβˆ − γ) ms2e , d. Use the F test for the general linear hypothesis to test: H0 : (β1, β3) ′ = 0 Ha : (β1, β3) ′ 6= 0 e. Test the hypothesis of question (d) using the extra sum of squares principle. 2
学霸联盟