STAT3008: Applied Regression Analysis
2020/21 Term 2 Mid-term Review (March 2nd 2021)
Chapter 1 Introduction
Terminologies: Response Variable (RV), Explanatory Variable (EV)
Mean Function and Variance Function
Separated Points: Outlier and Leveraged Point
Scatterplot and Scatterplot Matrix
Null Point: Constant mean and variance functions, no separated point
Chapter 2 Simple Linear Regression
OLS Estimates: Derivation
Terminologies: Fitted Value, Residuals, RSS
Properties of the OLS Estimates: Biasness and Variance
Maximum Likelihood Estimates: Derivation and Comparison against OLS Estimates
Analysis of Variance:
Decomposition of Sum of Squares (SSreg, RSS, SStotal)
Construction of ANOVA table
ANOVA Table vs Coefficient Table
Coefficient of Determination R2
Test for Betas, Confidence Interval for Fitted Value
Prediction Interval for based on new
Chapter 3 Multiple Linear Regression
Random Vector, Model Setup
OLS Estimates: Derivation, Vector/Matrix Differentiation and Trace Operation
Properties of the OLS Estimates
Biasness and Variance, Asymptotic Distribution of the OLS Estimates
Expected Value and Variance of Random Matrix
Maximum Likelihood Estimates
Analysis of Variance: Decomposition of Sum of Squares, Construction of ANOVA table
Coefficient of Determination R2
Test for Betas, Confidence Interval for Fitted Value
Prediction Interval for based on new
Questions from previous STAT3008 students
1. Is p-value computed based on one tail or two tails of the distribution?
2. How to compute SSreg if there is only one regression model specified in the problem?
*y *x
*y *x
Practice Exercises
Problem 1: Consider multiple linear regression with 3322110)|( xxxYE xX with
sample size n = 9. The coefficient table on the left shows the OLS estimates
The ANOVA table on the right tests the hypotheses:
H0: 110)|( xYE xX vs H1: 3322110)|( xxxYE xX
(a) Based on a T-statistic and a p-value from the Coefficient Table, construct the ANOVA table for
the hypotheses
H0: 22110)|( xxYE xX vs H1: 3322110)|( xxxYE xX
(b) Based on the given ANOVA table and the results from part (a), construct the ANOVA table for
the hypotheses
H0: 110)|( xYE xX vs H1: 22110)|( xxYE xX
(c) What conclusion can you draw from the ANOVA table from part (b)?
Problem2: Suppose x1, x2, …xn are known constants. Let y1, y2, …yn be independent random
variables with mean 0 and variance 1. Let
n
i
ix
n
x
1
1
and
n
i
iy
n
y
1
1
.
(a) Simplify
n
i
i yy
n 1
1
and 2
1
2
1
2
ynyyy
n
i
i
n
i
i
.
(b) Find
n
i
i
i y
xx
Var
1 SXX
.
Problem3: Suppose
3
2
1
,
4
2
0
,
400
021
031
βaM
(a) Let βaMββ'β ')( f . Express
β
β
)(f
in terms of β1, β2 and β3.
(b) What are the values of (β1, β2, β3) minimizing βaMββ'β ')( f in part (a)?
(c) Let βMβ 1111)( g . Express
β
β
)(g
in terms of β1, β2 and β3.
Problem4: Consider a n×(p+1) matrix X. Let X'X)X(X'H 1 .
(a) Simplify XHI )( , where I is an identity matrix.
(b) Is Ha symmetric matrix?
(c) Compute tr(H).
(d) Show that H3008=H.
Problem 5: Which of the following is true about multiple linear regression?
(i) When p=1, OLS estimate
SXX/SXY
SXX/SXY
ˆ
xy
β .
(ii) When p=0, OLS estimate yβˆ
(iii) tr(Var(e)) =p2, where tr(A) is the trace of a square matrix A.
(a) (i) and (ii) only
(b) (i) and (iii) only
(c) (ii) and (iii) only
(d) All of the above
Problem 6: Consider a multiple linear regression with response {yi, i = 1,2, … 30} and 3
explanatory variables {(xi1, xi2, xi3), i = 1,2, … 30}.
Suppose we want to use the Analysis of Variance (ANOVA) to test the two models below:
E(Y|X)= β0+ β1x1+ β3x3 vs E(Y|X)= β0+ β1x1+β2x2+β3x3
What are the degrees of freedom of the corresponding F-statistic?
(a) 1 and 26
(b) 1 and 27
(c) 2 and 26
(d) 2 and 27
SOLUTIONS/QUICK ANSWERS to the Practice Exercises – You need to show your work in full
details like problems 1 below during the mid-term exam.
Problem 1: (a) T-statistic= 2.636 and p-value =0.0462 from the coefficient table allows us to test
the hypotheses:H0: 22110)|( xxYE xX vs H1: 3322110)|( xxxYE xX
The corresponding ANOVA table is therefore given by:
(b) For the hypotheses H0: 110)|( xYE xX vs H1: 22110)|( xxYE xX
Note that RSS1=55,423 (from the given ANOVA table), and RSS2=48,762 (ANOVA table in part (a)).
The corresponding ANOVA table is therefore given by:
(c) Since p-value =0.40> 0.05. We do not reject H0 at α=0.05.
We do not have sufficient evidence to reject the model mean function 110)|( xYE xX at
α=0.05.
Problem 2: (a) 0 and 0 (b) 1/SXX
Problem 3:
(a)
48
244
42
)(
3
21
21
β
βf
, (b) (β1, β2, β3) = (1, -0.5, 0.5)
(c)
4/100
011
032
1
M ,
4/1
2
1
)(
β
βg
Problem 4:
(a) 0n×(p+1)
(b) H is symmetric since HX'XX'X'XXX'X'X'XX'XH' 111 )()()()'(')(
(c) p+1
(d) Note that HX'XX'XX'XX'XX'XX'XH2 111 )()()( .
Hence, H3008 = H3006(H2)= H3006(H) (since H2=H)
=H3007=H3006 =… =H
based on similar argument.
Problem 5: (a)
Problem 6: (a)