无代写-2R
时间:2022-12-12
Answer to Homework #3
1. True or false? Briefly explain.
(1) One can use 2R as a goodness-of-fit measure in the selection of regressors.
False. 2R always increases by adding more independent variables and thus cannot be
used as goodness-of-fit measure in the selection of regressors.
(2) When there is a single regressor, 2R and 2R are the same.
False. 2R and 2R are only equal when the regression involves no regressors. To see this
formally, recall that, by definition, )1(
1
11 22 R
kn
nR −
−−
−
=− . Thus, 2R and 2R are NOT
equal unless 0=k .
2. Exercise 7.7 (page 230)
(a) The t-statistic is 0.4852.61 0.186 1.96.= < Therefore, the coefficient on BDR is not statistically
significantly different from zero.
(b) The coefficient on BDR measures the partial effect of the number of bedrooms holding house
size (Hsize) constant. Yet, the typical 5-bedroom house is much larger than the typical
2-bedroom house. Thus, the results in (a) says little about the conventional wisdom.
(c) The 99% confidence interval for effect of lot size on price is 2000 × [.002 ± 2.58 × .00048] or
1.52 to 6.48 (in thousands of dollars).
(d) Choosing the scale of the variables should be done to make the regression results easy to read
and to interpret. If the lot size were measured in thousands of square feet, the estimate
coefficient would be 2 instead of 0.002.
(e) The 10% critical value from the 2,F ∞ distribution is 2.30. Because 0.08 < 2.30, the coefficients
are not jointly significant at the 10% level.
3. Using F test statistics we discussed in class, do Exercise 7.9 (page 230).
(a) Step 1: State the hypothesis
Null Hypothesis: 1 = 2
Alternative Hypothesis: 1 ≠ 2
Step 2: Estimate the unrestricted regression = 0 + 11 + 22 + and find the
SSR ()
Step 3: Estimate the restricted regression = 0 + 1(1 + 2) + and find the
SSR ()
2
Step 4: Plug in the values of and into the F-Statistic
formula:
= ( − )/
/( − − 1)
Compare your actual F statistic to the critical value. If your actual F statistic is greater
that your critical value, reject the null hypothesis.
(b) Restricted regression is Yi = β0 + β2(X2i − aX1i) + ui
(c) Restricted regression is Yi − X1i = β0 + β2(X2i − X1i) + ui
4. See the lecture note for the two consequences of multicollinearity.
5.
(1)
The coefficient on da is -0.007: it means that for a one unit increase in da, the distance from
parcel to airport, we expect to see a 0.7% decrease in p, the price, holding other variables constant.
The coefficient on mo is 0.020: it means that for a one unit increase in mo, month it takes to sell
the parcel, we expect to see a 2% increase in p, the price, holding other variables constant.
(2) The critical value for a two-side t test at the 5% significance level is 2.035. We can see from the
table: |t| of wl = |1.13| < 2.035; |t| of da = |-0.25| < 2.035; |t| of d75 = |-2.00| < 2.035. So coefficient
of wl, da, and d75 are not significantly different from zero at the 5% level.
(3) display "95 CI for mo is [" _b[mo]-_se[mo]*invttail(33, 0.025) ", " _b[mo]+_se[mo]*invttail(33, 0.025) "]"
The 95% confidence interval for is: [̂ − 2.035 ∗ �̂�, ̂ + 2.035 ∗ �̂�], that is, [.0104, .0300].
_cons 9.143132 .3621494 25.25 0.000 8.406333 9.87993
mo .0202226 .0048238 4.19 0.000 .0104085 .0300368
loga -.2769022 .0479683 -5.77 0.000 -.3744943 -.17931
d75 -.0661488 .0331367 -2.00 0.054 -.1335659 .0012683
da -.0069925 .0275574 -0.25 0.801 -.0630585 .0490735
wl .1492065 .1323043 1.13 0.268 -.1199687 .4183816
logp Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 10.4462077 38 .274900202 Root MSE = .35439
Adj R-squared = 0.5431
Residual 4.14460685 33 .125594147 R-squared = 0.6032
Model 6.30160081 5 1.26032016 Prob > F = 0.0000
F( 5, 33) = 10.03
Source SS df MS Number of obs = 39
3
(4) From the table, we see that F( 5, 33) = 10.03. From the F table, we get the critical value of 2.030.
Since F( 5, 33) = 10.03 > 2.030, we reject the null hypothesis that all of the slope coefficients in this
regression are equal to zero at the 10% significance level.
(5)
*Generate a new variable RSSU
reg logp loga wl da d75 mo
gen RSSU = e(rss)
**Generate a new variable RSSR, without wl and d75
reg logp loga da mo
gen RSSR = e(rss)
*Calculate F-statistic
gen F = ((RSSR-RSSU)/2)/(RSSU/33)
sum F
We see F( 2, 33) = 2.11, compared with the critical value 5.312: since F( 2, 33) = 2.11 < 5.312, we
fail to reject the null hypothesis that the coefficients on wl and d75 are equal to zero at the same time at
the 1% significance level.
(6) With d74, Adj-R2 = 0.5431; w/o d74, Adj-R2 = 0.5030. The rule of thumb is that we want to maximize
Adjusted R-Squared by choosing the model with the highest Adjusted R-Squared value. So we should
include d74 in our regression.
(7) & (8) First, run regression as full model and check its AIC & SIC value:
AIC0 = 35.25, SIC0 = 45.23
Since the coefficients of wl and da are not statistically significantly different from zero, it may be
a good start point to drop them.
Second, run regression w/o wl and da and check its AIC & SIC value:
AIC1 = 33.05, SIC1 = 39.70
. 39 -29.65081 -11.62441 6 35.24881 45.23018
Model Obs ll(null) ll(model) df AIC BIC
. 39 -29.65081 -12.52509 4 33.05019 39.70443
Model Obs ll(null) ll(model) df AIC BIC
4
The rule of thumb for the AIC is that the preferred model is the one with the minimum AIC
value. After a few more trials, you will still find that this model has the lowest value of AIC. The result of
SIC is consistent with the AIC.
(9) Note first that the t-statistics for MO is 4.19, meaning that MO is a relevant variable. As we know, the
omission of the relevant variable MO gives biased estimates of the coefficients on included variables.
(10) Adding irrelevant variables gives unbiased but inefficient coefficient estimates and thus, the
hypothesis testing based on the wrong standard errors is misleading.