HW 4: Due March 14, 23:59:59
Problem 1
(SW 5.1) Suppose a researcher, using data on class size (CS) and average test scores from 50
third-grade classes, estimates the OLS regression
ˆTestScore = 640.3
(23.5)
− 4.93
(2.02)
× CS R2 = 0.11, SER = 8.7.
(1) Construct a 95% confidence interval for β1, the regression slope coefficient.
(2) Calculate the p-value for the two-sided test of the null hypothesis 0. Do you reject the null
hypothesis at the 5% level? At the 1% level?
(3) Calculate the p-value for the two-sided test of the null hypothesis H0 : β1 = −5.0.
(4) Without doing any additional calculations, determine whether -5.0 is contained in the 95%
confidence interval for β1.
1
Ans:
(1) C.I. is given by:(
− 4.93− 1.96× s.e.(βˆ1), −4.93 + 1.96× s.e.(βˆ1)
)
= (−4.93− 1.96× 2.02, −4.93 + 1.96× 2.02)
= (−8.889, −0.9708)
(2) p-value is computed as:
t− statistic = −4.93− 0
s.e.(βˆ1)
=
−4.93
2.02
= −2.44.
Since sample size is reasonably large (n = 50), we can use normal approximations. Therefore,
p− value = Pr(|z| > |tact|) = Pr(|z| > 2.44) = 2× 0.00734 ≈ 0.015.
Hence, can reject at 5% significance level, but not at 1%.
(3) First, compute the t-statistic:
t− statistic = −4.93− (−5)
s.e.(βˆ1)
=
0.07
2.02
≈ 0.035.
Therefore,
p− value = Pr(|z| > |tact|) = Pr(|z| > 0.035) ≈ 2× 0.48405 = 0.97
Note that this p-value is much larger than 5%, so the null hypothesis cannot be rejected at 5%
significance level.
(4) It is contained in the 95% confidence interval since the H0 : β1 = −5 cannot be rejected at the
5% level (as per (3)).
2
Problem 2
(SW 5.4) Read the box “Economic Value of a Year of Education: Homoskedasticity or Heteroskedas-
ticity?” in Section 5.4 of our Stock and Watson textbook. Use the regression reported in Equation
(5.23), reproduced below, to answer the following:
ˆEarnings = −12.12
(1.36)
+ 2.37
(0.10)
× Y rsEd
(1) A randomly selected 30-year-old worker reports an education level of 16 years. What is the
worker’s expected average hourly earnings?
(2) A high school graduate (12 years of education is contemplating going to a community college
for a 2-year degree. How much are this worker’s average hourly earnings expected to increase?
(3) A high school counselor tells a student that, on average, college graduates earn $10 per hour
more than high school graduates. Is this statement consistent with the regression evidence? What
range of values is consistent with the regression evidence?
3
Ans:
(1) −12.12 + 2.37× 16 = $25.8 per hour.
(2) (−12.12 + 2.37× 14)− (−12.12 + 2.37× 12) = 2.37× 2 = $4.74 per hour.
(3) Going to college adds 4 years of education. Similar to (2), the expected increment in earnings
is 2.37 × 4 = $9.48 per hour. This seems about consistent with the counselor’s claim. We can do
better: the 95% C.I. for the earnings increment is
4×
(
(2.37− 1.96× 0.1), (2.37 + 1.96× 0.1)
)
= 4× (2.174, 2.566) = (8.696, 10.264).
This confidence interval contains the counselor’s claim ($10). Therefore, the counselor’s claim
cannot be rejected under a significance level of 5%.
4
Problem 3
(SW 5.7) Suppose (Xi, Yi) satisfy the least squares assumptions (LSA#1 ∼ 3). A random sample
of size n = 250 is drawn and yields:
Yˆ = 5.4
(3.1)
+ 3.2
(1.5)
X, R2 = 0.26, SER = 6.2.
(1) Test H0 : β1 = 0 vs H1 : β1 ̸= 0 at the 5% level.
(2) Construct a 95% confidence interval for β1.
(3) Suppose you learned that Yi and Xi are independent. Would you be surprised? Explain.
(4) Suppose Xi and Yi are independent and many samples of size n = 250 are drawn, regression
estimated, and (1) and (2) answered. In what fraction of the samples wouldH0 from (1) be rejected?
In what fraction of samples would β1 = 0 be included in the confidence interval from (2)?
5
Ans:
(1) n = 250 is large, so we can use normal approximation.
t− statistic = 3.2
1.5
= 2.133.
The p-value is:
Pr(|z| > |tact|) = Pr(|z| > 2.133) ≈ 0.03.
Since p− value = 0.03 < 0.05, H0 is rejected at the 5% level.
(2) A 95% confidence interval is:(
3.2−1.96×s.e.(βˆ1), 3.2+1.96×s.e.(βˆ1)
)
= (3.2−1.96×1.5, 3.2+1.96×1.5) = (0.26, 6.14)
(3) Yes, because independence implies no relationship between Xi and Yi, linear or nonlinear. Yet
we have rejected H0 at the 5% level, implying a linear relationship. This is surprising.
(4) Independence implies β1 = 0. Therefore, the question asks the fraction of H0 rejected when
H0 : β1 = 0 is true. This is precisely the definition of the significance level of the test; so it will be
rejected 5% of the time. Similarly, β1 = 0 would be included in the C.I. 95% of the time.
6
Problem 4
Suppose we have a randomly sampled (i.i.d.) dataset, a set of pairs (Wi, Di), i = 1, . . . n. Here, Di
is a dummy (or “binary” or “indicator”) variable such that Di = 1 if female, Di = 0 if male. Wi
represents weight of the individual, in kg.
The researcher uses this dataset to divide up (Wi, Di) into two subgroups: one where Di = 1 (the
female group), and one where Di = 0 (and the male group). Let µfemale and µmale be the true
means of the female and male group weights. The researcher is interested in µfemale − µmale.
The researcher finds that:
nfemale = the sample size of the female group = 238
nmale = the sample size of the male group = 182
W female =
238∑
i=1
= 50.0
Wmale =
182∑
i=1
= 57.4.
(1) IsW female an unbiased estimator for µfemale? IsWmale an unbiased estimator for µmale? Why?
(2) Is W female −Wmale an unbiased estimator for µfemale − µmale? Why?
(3) Suppose V (Wi|Di = 1) = σ2f and V (Wi|Di = 0) = σ2m. What is V (W female) and V (Wmale)?
(4) Are W female Wmale independent? What is V (W female −Wmale)?
(5) Suppose the sample analogues of σ2f and σ
2
m (i.e., S
2
f , S
2
m) were computed to be:
S2f = 19.4
2
S2m = 17.9
2
Compute s.e.(W female −Wmale).
7
Now, suppose the researcher ran the following regression model on the same dataset:
Wi = β0 + β1Di + ui,
where, as before, Di is a dummy (or “binary” or “indicator”) variable such that Di = 1 if female,
Di = 0 if male. Wi represents weight, in kg.
(6) Do you expect LSA#2 to hold in this regression?
(7) Assume that LSA#1∼#3 holds in this regression. What would you expect to get for βˆ1 and
s.e.(βˆ1)? Explain.
8
Ans:
(1) Yes. Since (Wi, Di) are i.i.d. and subgroups were generated by Di = 1 or Di = 0, the Wi are
i.i.d. within each subgroup. Therefore, as per our previous lectures on the sampling distribution of
the sample mean, unbiasedness holds.
(2) Yes, E[W female −Wmale] = E[W female]− E[Wmale] = µfemale − µmale.
(3) V (W female) =
σ2f
nfemale
=
σ2f
238 and V (Wmale) =
σ2m
nmale
= σ
2
m
182 . This works because we are assuming
i.i.d.
(4) Yes, W female and Wmale are independent because (Wi, Di) are i.i.d.. Hence,
V (W female −Wmale) = V (W female + (−Wmale))
= V (W female) + V (−Wmale) + 2cov(W female,−Wmale)
= V (W female) + V (Wmale) + 0 (∵ indep)
=
σ2f
238
+
σ2m
182
.
(5) Just switch σ2f , σ
2
m to the sample analogues:√
S2f
238
+
S2m
182
=
√
19.42
238
+
17.92
182
= 1.8
(6) Yes, because of the i.i.d. assumption on (Wi, Di).
(7) βˆ1 =W female −Wmale = −7.4 and s.e.(βˆ1) = 1.8, from question (4). The reasoning is straight
from the lecture notes (and SW): That is, the estimated coefficient on the dummy variable regression
is precisely the group difference in sample means. In other words, the dummy variable regression
is just another way to do difference in sample means analysis. Hence, the estimated difference in
means equals βˆ1, and s.e.(βˆ1) equals the standard error computed from the difference in sample
means from (4).