Page 1/4
STAT3008: Applied Regression Analysis
2019/20 Term 2
PAST Mid-Term Examination and quick answers
Date: 7th April 2020 (Tuesday)
Time: 9:30am – 12:15pm (165 minutes)
Total Score: 100 points
Please present your answers in 4 significant figures.
Submission Requirement: (1) Name and SID on the 1st page of your work,
(2) Only a single file in .pdf or .doc* format (size < 10MB) will be accepted
(3) Filename in the format of “LAST NAME First Name – SID.pdf/doc*”
How to submit your exam work? A dropbox button is now available on Blackboard.
Problem 1 [27 points]: Suppose the following regression model is fitted to a data set with
observations {(xi1, xi2, yi), i = 1, 2, …, n}:
),0(~ , 22211 Neexxy
iid
iiiii
Assume that
n
i ii
xx
1 21
0 .
(a) [8 points] Derive the OLS estimates 1ˆ and 2ˆ .
(b) [6 points] Setup the log-likelihood function ),,( 221 l .
(c) [4 points] Do you expect the MLE 1
~
and 2
~
to be the same as their corresponding
OLS estimates 1ˆ and 2ˆ in part (a)? Explain. (No computation required)
(d) [5 points] Is 1ˆ an unbiased estimator for β1? Verify.
(e) [4 points] Does the point
iiiiii yx
n
yx
n
x
n
x
n
yxyxxxyxx 21
2
2
2
121
2
2
2
121
11
,
1
,
1
, ,),,(
pass through the regression line based on the OLS estimates? Verify.
Problem 2 [16 points]: Consider multiple linear regression 1n11)(p1)(pn1n eβXY with
1)( nE 0e and nIeVar
2)( . Let ')'( 1XXXXA and ')'( 1XXXXIB n .
(a) [4 points] Prove or disprove the following: AABA .
(b) [4 points] Prove or disprove the following:
75
BIA n .
(c) [8 points] Simplify the following in terms of 2, n and p: YXXXXe' ')'( 1E .
Page 2/4
Problem 3 [24 points]: A simple linear regression is fitted to the data {(x1, y1), … (x48, y48)}, with
2
10 )|(Var ,)|( xXYxxXYE
The coefficient table and ANOVA table below shows some of the regression results:
It’s known that R2 = 15%.
(a) [16 points] Replicate the two tables above and fill in ALL the missing values (in 4 significant
figures).
(b) [8 points] Based on the results in part (a), test the hypotheses on whether β0 is greater than
-12.0 at α=0.05. You should setup the 4 steps of hypothesis testing as on Ch2 page 64.
Note: R functions like “pf”, “pt”, “qf” and “qt” could be useful in this problem.
Problem 4 [19 points]: Consider multiple linear regression with 3 explanatory variables (EVs) x1, x2
and x3. Two hypothesis testing was performed on models with selected EVs, and the results were
summarized by the two ANOVA tables below:
H0: 0)|( xXYE
vs H1: 22110)|( xxYE xX
H0: )|( 33110 xxYE xX
vs H1: 3322110)|( xxxYE xX
It’s known that the sample correlation between y and each of the xi are 91.118%, -44.260% and
99.556% respectively. That is, %556.99),(ˆ and %260.44),(ˆ %,118.91),(ˆ 321 xyxyxy
(a) [11 points] Replicate the table below, and fill in ALL the missing values (in 4 significant figures).
(df and RSS of Model 7: 33220)|( xxYE xX
have already been included in the table)
(b) [4 points] Do you think multicollinearity exists in Model 8:
3322110
)|( xxxYE xX ?
Explain.
(c) [4 points] Do you think the sample correlation between x1 and x2 (i.e. ),(ˆ 21 xx ) is close to 0?
Explain.
Page 3/4
Problem 5 [14 points]: Suppose we are interested in explaining the sale price of a house by 4
variables relating to its size and age (grey columns below). The table below shows the data of the
first 6 houses in the data set:
A multiple linear regression was fitted into y = ln(SalePrice) based on the 4 EVs. The table
below shows the parameter estimates:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.947e-01 4.175e-01 2.143 0.0323 *
Year 5.231e-03 2.151e-04 24.323 < 2e-16 ***
FirstFloor 3.378e-05 3.102e-05 1.089 0.2764
Basement -2.274e-04 2.948e-05 -7.714 2.6e-14 ***
Total 3.954e-04 1.446e-05 27.335 < 2e-16 ***
Residual standard error: 0.212 on 1169 degrees of freedom
Multiple R-squared: 0.7353, Adjusted R-squared: 0.7344
F-statistic: 811.8 on 4 and 1169 DF, p-value: < 2.2e-16
Note that most of the parameter estimates are intuitive. For example, Yearˆ = 0.005231 > 0 is
consistent with the fact that a newer house (larger Year) is supposed to be sold at a higher price.
(a) [12 points] Based on the parameter estimates above, comment on whether each of the
following are consistent with your intuition:
(I) Basementˆ = -0.0002274 < 0
(II) Totalˆ = 0.0003954 > FirstFloorˆ = 0.00003378 > 0
(b) [2 points] What is the sample size n of the data set?
- End of the Exam -
Page 4/4
Quick Answers
In the actual exam, you are required to show all the details of your work instead
of just the final answers below.
Problem 1:
(a)
n
1
2
2
n
1 2
2n
1
2
1
n
1 1
1
ˆ , ˆ
i i
i ii
i i
i ii
x
yx
x
yx
(b)
n
i
iii
n
i
xxy
n
l
1
2
22112
1
22
21
2
1
)2ln(
2
),,(
(c) Yes. Explain.
(d) Yes. Verify.
(e) Yes. Verify.
Problem 2:(a) ABA≠A
(b) A5 = … = In -B
7
(c) 2)1( p
Problem 3:
(a)
(b) (Step 1) H0: β0 = -12.0 vs H1: β0 > -12.0
(Step 2) t0 = (-9.9081-(-12))/5.3871 = 0.3883
(Step 3) Since p-value = Pr(t46 > t0) = 0.3498 > 0.05, we do not reject H0 at α = 0.05.
(Step 4) We do not have sufficient evidence that β0 is greater than -12.0.
Problem 4:
(a)
(b) Yes. Explain… (material from Ch4, not in the upcoming midterm)
(c) Yes. Explain… (material from Ch4, not in the upcoming midterm)
Problem 5:
(a) From Ch4 (not in the upcoming midterm)
(b) n = 1174