STAT3008: Applied Regression Analysis
2020/21 Term 2 Mid-term Review (March 2nd 2021)

Chapter 1 Introduction
Terminologies: Response Variable (RV), Explanatory Variable (EV)
Mean Function and Variance Function
Separated Points: Outlier and Leveraged Point
Scatterplot and Scatterplot Matrix
Null Point: Constant mean and variance functions, no separated point

Chapter 2 Simple Linear Regression
OLS Estimates: Derivation
Properties of the OLS Estimates: Biasness and Variance
Maximum Likelihood Estimates: Derivation and Comparison against OLS Estimates
Analysis of Variance:
Decomposition of Sum of Squares (SSreg, RSS, SStotal)
Construction of ANOVA table
ANOVA Table vs Coefficient Table
Coefficient of Determination R2
Test for Betas, Confidence Interval for Fitted Value
Prediction Interval for based on new

Chapter 3 Multiple Linear Regression
Random Vector, Model Setup
OLS Estimates: Derivation, Vector/Matrix Differentiation and Trace Operation
Properties of the OLS Estimates
Biasness and Variance, Asymptotic Distribution of the OLS Estimates
Expected Value and Variance of Random Matrix
Maximum Likelihood Estimates
Analysis of Variance: Decomposition of Sum of Squares, Construction of ANOVA table
Coefficient of Determination R2
Test for Betas, Confidence Interval for Fitted Value
Prediction Interval for based on new

Questions from previous STAT3008 students
1. Is p-value computed based on one tail or two tails of the distribution?
2. How to compute SSreg if there is only one regression model specified in the problem?
*y *x
*y *x
Practice Exercises
Problem 1: Consider multiple linear regression with 3322110)|( xxxYE   xX with
sample size n = 9. The coefficient table on the left shows the OLS estimates

The ANOVA table on the right tests the hypotheses:
H0: 110)|( xYE   xX vs H1: 3322110)|( xxxYE   xX
(a) Based on a T-statistic and a p-value from the Coefficient Table, construct the ANOVA table for
the hypotheses
H0: 22110)|( xxYE   xX vs H1: 3322110)|( xxxYE   xX
(b) Based on the given ANOVA table and the results from part (a), construct the ANOVA table for
the hypotheses
H0: 110)|( xYE   xX vs H1: 22110)|( xxYE   xX
(c) What conclusion can you draw from the ANOVA table from part (b)?

Problem2: Suppose x1, x2, …xn are known constants. Let y1, y2, …yn be independent random
variables with mean 0 and variance 1. Let 

n
i
ix
n
x
1
1
and 

n
i
iy
n
y
1
1
.
(a) Simplify  

n
i
i yy
n 1
1
and   2
1
2
1
2
ynyyy
n
i
i
n
i
i 

 .
(b) Find 



 

n
i
i
i y
xx
Var
1 SXX
.

Problem3: Suppose

3
2
1
,
4
2
0
,
400
021
031

βaM
(a) Let βaMββ'β ')( f . Express
β
β

 )(f
in terms of β1, β2 and β3.
(b) What are the values of (β1, β2, β3) minimizing βaMββ'β ')( f in part (a)?
(c) Let   βMβ 1111)( g . Express
β
β

 )(g
in terms of β1, β2 and β3.

Problem4: Consider a n×(p+1) matrix X. Let X'X)X(X'H 1 .
(a) Simplify XHI )(  , where I is an identity matrix.
(b) Is Ha symmetric matrix?
(c) Compute tr(H).
(d) Show that H3008=H.

Problem 5: Which of the following is true about multiple linear regression?
(i) When p=1, OLS estimate
 




 

SXX/SXY
SXX/SXY
ˆ
xy
β .
(ii) When p=0, OLS estimate yβˆ
(iii) tr(Var(e)) =p2, where tr(A) is the trace of a square matrix A.
(a) (i) and (ii) only
(b) (i) and (iii) only
(c) (ii) and (iii) only
(d) All of the above

Problem 6: Consider a multiple linear regression with response {yi, i = 1,2, … 30} and 3
explanatory variables {(xi1, xi2, xi3), i = 1,2, … 30}.
Suppose we want to use the Analysis of Variance (ANOVA) to test the two models below:
E(Y|X)= β0+ β1x1+ β3x3 vs E(Y|X)= β0+ β1x1+β2x2+β3x3
What are the degrees of freedom of the corresponding F-statistic?
(a) 1 and 26
(b) 1 and 27
(c) 2 and 26
(d) 2 and 27

SOLUTIONS/QUICK ANSWERS to the Practice Exercises – You need to show your work in full
details like problems 1 below during the mid-term exam.

Problem 1: (a) T-statistic= 2.636 and p-value =0.0462 from the coefficient table allows us to test
the hypotheses:H0: 22110)|( xxYE   xX vs H1: 3322110)|( xxxYE   xX
The corresponding ANOVA table is therefore given by:

(b) For the hypotheses H0: 110)|( xYE   xX vs H1: 22110)|( xxYE   xX
Note that RSS1=55,423 (from the given ANOVA table), and RSS2=48,762 (ANOVA table in part (a)).
The corresponding ANOVA table is therefore given by:
(c) Since p-value =0.40> 0.05. We do not reject H0 at α=0.05.
We do not have sufficient evidence to reject the model mean function 110)|( xYE   xX at
α=0.05.

Problem 2: (a) 0 and 0 (b) 1/SXX

Problem 3:
(a)



48
244
42
)(
3
21
21



β
βf
, (b) (β1, β2, β3) = (1, -0.5, 0.5)
(c)


4/100
011
032
1
M ,

 

4/1
2
1
)(
β
βg

Problem 4:
(a) 0n×(p+1)
(b) H is symmetric since     HX'XX'X'XXX'X'X'XX'XH'   111 )()()()'(')(
(c) p+1
(d) Note that    HX'XX'XX'XX'XX'XX'XH2   111 )()()( .
Hence, H3008 = H3006(H2)= H3006(H) (since H2=H)
=H3007=H3006 =… =H
based on similar argument.

Problem 5: (a)
Problem 6: (a)  