ECMT1020-stata代写
时间:2023-06-07
ECMT1020 Introduction to Econometrics 2022S1
Final Questions and Answers
1 Multiple choice questions (10 points)
There is only one correct answer to each question. Each question is worth 2 points.
1. (Probability limit) A random variable X has population mean µ ̸= 0 and population variance
σ2. A random sample of n observations, X1, . . . , Xn, is generated. Which of the following
statements is incorrect?
(a) The average of even-numbered observations as an estimator for µ is unbiased. (correct)
(b) The inverse of the sample mean is a consistent estimator for 1/µ. (correct)
(c) The sample mean is an unbiased estimator for µ. (correct)
(d) The average of odd-numbered observations as an estimator for µ is not consistent. (incor-
rect)
2. (Hypothesis test) Suppose the population mean of a random variable X is µ. We use a random
sample of X to test the null hypothesis H0 : µ = 0. For a given significance level, which of the
following statements is incorrect?
(a) The critical value of the test changes when we switch from a two-sided test to a one-sided
test. (correct)
(b) If we reject the two-sided test, we must also reject the one-sided test. (correct)
(c) If we could not reject the one-sided test, then we still may reject the two-sided test. (incor-
rect)
(d) Even if we could not reject the two-sided test, we still may reject the one-sided test. (correct)
3. (Adjusted R2) Which of the following statements is incorrect?
(a) The adjusted R2 can be greater than one. (incorrect)
(b) The adjusted R2 does not always increases as we include more regressors. (correct)
(c) The adjusted R2 can never be greater than R2. (correct)
(d) The adjusted R2 can take negative values. (correct)
(e) The R2 will never fall as we include more regressors. (correct)
4. (IV) Suppose X is one of the explanatory variables in a multiple regression, and it is associated
with a measurement error which is correlated with the disturbance term. A researcher proposes
to use another variable Z as an instrument for X. Which of the following is not the requirement
for Z being a valid instrument variable?
(a) Z should be correlated with X. (it is)
(b) Z should not be correlated with the disturbance term. (it is)
(c) Z should not be one of the explanatory variables in the regression. (it is)
(d) Z should not be correlated with the other explanatory variables in the regression. (it is
not)
5. (Measurement error) Suppose that a variable Y depends on a variable Z but Z is measured
with measurement error. That is, the true relationship is
Y = β1 + β2Z + v
but Z is measured as X = Z + w where w is the measurement error. We run the regression
Y = β1 + β2X + u
to obtain the OLS estimator βˆ1 and βˆ2. Which of the following statements is correct?
1
(a) If w is independent of both Z and v, then βˆ2 is consistent for β2. (incorrect)
(b) If w and Z are correlated but w is independent of v, then βˆ2 is consistent for β2. (incorrect)
(c) If w and v are correlated but w is independent of Z, then βˆ2 is consistent for β2. (incorrect)
(d) Even if w is independent of both Z and v, βˆ2 is inconsistent for β2. (correct)
2 Multiple answer questions (10 points)
There may be multiple answers, please select all that apply. Each question is worth 2 points.
1. (Statistics) Let X be a random variable with variance 1. Suppose that we have a random sample
of X with only two observations, X1 and X2. We construct an estimator Z = 0.6X1 + 0.6X2
for the population mean of X. Please select all correct statement(s).
(a) Z is a biased estimator for the population mean of X. (correct)
(b) Z is an unbiased estimator for the population mean of X. (incorrect)
(c) The variance of Z is 0.5. (incorrect)
(d) The mean squared error of Z is 0.64. (incorrect)
(e) The mean squared error of Z is greater than 0.64. (correct)
2. (Single regression) Consider the simple regression
CARE = β1 + β2CHILDREN+ u
where the dependent variable is the number of minutes a person spends each day caring for
household members, and the independent variable is the number of children the person has.
The estimation results are shown in the below table with some values intentionally removed:
Based on the above table, please select the correct statement(s).
(a) Both removed p-values in the table are less than 0.05. (correct)
(b) The removed p-value of the test for the significance of the intercept is greater than 0.05.
(incorrect)
(c) The removed confidence interval in the table covers zero. (incorrect)
(d) The removed value of the slope coefficient is 32.16. (correct)
(e) The removed value of the slope coefficient is 50.25. (incorrect)
(f) The removed value of the t statistic of the intercept is 11.04. (incorrect)
3. (Multiple regression) In a multiple regression, other factors not changed, the estimated coeffi-
cient βˆj is less accurate when
(a) the variance of the disturbance term is larger. (correct)
(b) the variance of the disturbance term is smaller. (incorrect)
(c) the corresponding regressor is more correlated with other regressors. (correct)
(d) the corresponding regressor is less correlated with other regressors. (incorrect)
(e) the mean squared deviation of the corresponding regressor is smaller. (correct)
(f) the mean squared deviation of the corresponding regressor is larger. (incorrect)
2
4. (Dummy) We want to build a regression model for analyzing the school cost function. In our
data set, we have the following variables:
C = annual cost for running the school
S = number of students in the school
R =
{
1, if the school is a residential school
0, if the school is a non-residential school
N =
{
0, if the school is a residential school
1, if the school is a non-residential school
Suppose that we believe that the residential/non-residential feature affects the overhead school
cost, and each additional student incurs a constant marginal cost for running the school. Please
select the correct model specification(s):
(a) C = β1 + β2S + β3R+ β4N + u (incorrect)
(b) C = β1 + β2S + β3N + u (correct)
(c) C = β2S + β3N + u (incorrect)
(d) C = β2S + β3R+ u (incorrect)
(e) C = β2S + β3R+ β4N + u (correct)
5. (Specification) Please select from below the situation(s) that will cause the OLS estimator of
the linear regression model to be biased in general.
(a) Relevant variables are omitted. (correct)
(b) Redundant regressors are included in the model. (incorrect)
(c) There is heteroskedasticity in the disturbance term. (incorrect)
(d) There is a measurement error in the dependent variable, and the measurement error is
correlated with the regressors. (correct)
(e) There is a measurement error in the dependent variable, and the measurement error is not
correlated with the regressors. (incorrect)
3 Numerical answer questions (20 points)
Each question is worth 2 points.
1. (Probability) Suppose X is a random variable indicating the face value when a single unfair
dice is thrown. The probability distribution of X is given by
Pr(X = x) =
7− x
21
, x = 1, 2, 3, 4, 5, 6.
Let Y be another random variable given as Y = a+ bX. What is the covariance of λX and Y ?
Answer: The covariance of X and Y is
Cov(λX, Y ) = Cov (λX, a+ bX) = λbCov(X,X) = λbVar(X).
The variance of X can be computed from the the probability distribution of X using the formula
Var(X) = E(X2)− [E(X)]2 = 196
21

(
56
21
)2
≈ 2.22.
So the covariance of X and Y is 2.22λb.
3
2. For two random variables X and Y , we know that X = a+ bY . Suppose Z is a third random
variable, and the correlation coefficient betweenX and Z is ρ. What is the correlation coefficient
between Y and Z?
Answer : The correlation coefficient between Y and Z is sgn(b)·ρ.
3. Consider the Stata summary statistics and regression output (some values are deliberately
removed).
Based on the above information, fill in the following missing values:
• Model df: 4
• Residual df: 269− 5 = 264
• Total df: 268
4. (Nonlinear models) Suppose that the logarithm of Y is regressed on the logarithm of X, and
the fitted OLS regression is
l̂og Y = βˆ1 + βˆ2 logX.
Suppose that we define a new regressor X∗ = µX and run OLS regression of log Y on logX∗.
What is the value of the intercept in the new regression?
Answer : This is an application of Exercise 4.8 in the textbook. The intercept in the new
regression is βˆ∗1 = βˆ1 − βˆ2 logµ.
5. (Nonlinear models) Suppose that the logarithm of Y is regressed on the logarithm of X, and
the fitted OLS regression is
l̂og Y = βˆ1 + βˆ2 logX
and the t ratio of the slope coefficient is λ. Suppose that we define a new regressor X∗ = µX
and run OLS regression of log Y on logX∗. What is the t ratio of the slope coefficient in the
new regression?
Answer : This is an application of Exercise 4.8 in the textbook. The t-ratio of the slope coefficient
does not change.
6. (Nonlinear models) Consider the following regression model
House price = β1 + β2Size+ β3Bedrooms+ β4Size · Bedrooms+ u.
Suppose the estimated coefficients are all different from zero. We plot
A. the predicted relation between house price and size for 2-bedroom houses, and
4
B. the predicted relation between house price and size for 3-bedroom houses.
Select from the list: “Plot A and plot B (do not have/have) the same intercept, and plot A and
plot B (do not have/have) the same slope.
7. (Dummy) Consider a simple OLS regression
EARNINGS = β1 + β2F+ u
where the dependent variable is hourly earnings (in dollars), and F = 1 if the individual is a
female and F = 0 if the individual is a male.
We also know that
• the average hourly earnings for the whole sample is x1;
• the average hourly earnings for males is x2;
• the average hourly earnings for females is x3.
Based on the above information, what will the estimate of β1 and β2 be for the above regression?
Answer : The estimate of β1 is x2 and the estimate of β2 is x3 − x2.
8. (Specification test) Suppose that we want explain an individual’s educational attainment, mea-
sured by the years of schooling S, by the education level of their parents (SM and SF ) and the
individual’s cognitive ability measured by the scores of tests of arithmetic reasoning, A, word
knowledge, W, and paragraph comprehension, P. In particular, consider the regression model
S = β1 + β2SM+ β3SF+ β4A+ β5W + β6P + u.
To test the hypothesis
H0 : β4 = 2β5 and β4 = 2β6
we fitted the following two regressions using a sample with n individuals and obtained the
following results:
Sˆ = βˆ1 + βˆ2SM+ βˆ3SF+ βˆ4A+ βˆ5W + βˆ6P
Sˆ = γˆ1 + γˆ2SM+ γˆ3SF+ βˆ4C, where C = A+
1
2
W +
1
2
P.
The RSS was RSSU for the first regression and RSSR for the second regression. What is the
value of the F statistic for testing the above null hypothesis?
Answer : The F statistic is
F =
improvement of goodness of fit/extra DF
remaining RSS/remaining DF
=
(RSSR −RSSU )/2
RSSU/(n− 6) .
9. (GQ test) A researcher investigating whether government expenditure tends to crowd out in-
vestment uses data of 30 countries to fit the regression
Iˆ = 18.1− 1.07G+ 0.36Y
where I is the investment, G is the government recurrent expenditure, and Y is GDP. She
sorts the observations by increasing size of Y and fits the regression again for the 11 countries
with smallest Y and the 11 countries with the largest Y . RSS for these two regressions is
RSS1 and RSS2, respectively. Suppose she wants to use the Goldfeld-Quandt test to test
H0 : RSS1 = RSS2 against RSS1 < RSS2, and obtain in her data that RSS1 = R1 and
RSS2 = R2.
Fill the blank: “The test statistic follows an F (11− 3, 11− 3) distribution under the null
hypothesis. The realized test statistic in her sample is .”
Answer : The GQ test statistic is
GQ =
RSS2
RSS1
in this case.
5
10. (Weighted LS) Suppose the true regression model is
Yi = β1 + β2Xi + ui
and it happens that that variance σ2ui of disturbance term for the ith observation is proportional
to the value of Xi. To improve the efficiency of the estimation, we define a new variable
Z = Y/X, and obtain a fitted weighted least squares regression:
Zˆi = βˆ2 + βˆ1
1
Xi
Fill in the blanks: “The weighted least squares estimator for β1 is βˆ1 and the weighted least
squares estimator for β2 is βˆ2.
4 Short answer questions (10 points)
1. (5pt) Suppose we want to test the null hypothesis
H0 : β3 = β5 = 0
for understanding whether X3 and X5 are both redundant in the following multiple regression
model
Y = β1 + β2X2 + β3X3 + β4X4 + β5X5 + u.
You would like to construct an F test using a sample of 30 observations on Y,X2, X3, X4, X5.
Please
(1) write down explicitly the alternative hypothesis, (1pt)
(2) explain how you would obtain the test statistic, (2pt)
(3) find the critical value of the 5 percent test, (0.5pt)
(4) explain how you would make the testing decision for the 5 percent test, (0.5pt)
(5) explain what you would conclude if the null hypothesis is rejected. (1pt)
Answer :
(1) The alternative hypothesis is
H1 : β3 ̸= 0 or β5 ̸= 0.
(2) We run following restricted and unrestricted models:
restricted : Y = β1 + β2X2 + β4X4 + u,
unrestricted : Y = β1 + β2X2 + β3X3 + β4X4 + β5X5 + u.
and obtain the residual sum of squares (RSS) of the two regressions which are denoted,
respectively, as RSSR and RSSU (1pt). Since there are 5 parameters in the unrestricted
model and there are 2 restrictions to test, the F test statistic is (1pt)
F (2, 30− 5) = (RSSR − RSSU )/2
RSSU/(30− 5) =
(RSSR − RSSU )/2
RSSU/25
.
(3) 5% critical value of the F (2, 25) distribution is 3.3852.
(4) To make testing decision, we compare the realized F test statistic with the critical value
3.3852, and reject the null hypothesis if F test statistic is greater than the critical value.
Note that it is okay if they use p-value to make testing decision for this part.
(5) If the null hypothesis is rejected, then that means at least one of β3 and β5 is significantly
different from zero, and hence X3 and X5 are not both redundant.
6
2. (5pt) A researcher has data on output per worker, Y , and capital per worker, K, both measured
in thousands of dollars, for 50 firms in textile industry in 2012. She hypothesises that output
per worker depends on capital per worker and perhaps also the technological sophistication of
the firm, TECH :
Y = β1 + β2K + β3TECH+ u
where u is a disturbance term. She is unable to measure TECH and decides to use expenditure
per worker on research and development in 2012, R&D, as a proxy for it. She fits the following
regression (standard errors in parentheses):
Yˆ = 1.02
(0.45)
+ 0.32
(0.04)
K, R2 = 0.749
Yˆ = 0.34
(0.61)
+ 0.29
(0.22)
K + 0.05
(0.15)
R&D, R2 = 0.750.
The correlation coefficient for K and R&D was 0.92.
Please discuss, respectively, the regression results based on the below two assumptions:
(1) assuming that Y does depend on both K and TECH. (2.5pt)
(2) assuming that Y depends only on K. (2.5pt)
Answer :
(1) If Y depends on both K and TECH, the first specification is subject to omitted variable
bias and standard errors are invalid. In specific, the bias of the coefficient of K is
βˆ2 − β2 = β3 · γK,TECH
where
γK,TECH =
∑n
i=1(Ki −K)(TECHi − TECH)∑n
i=1(Ki −K)2
.
has the same sign as the sample correlation coefficient for K and TECH. Since the correla-
tion coefficient for K and R&D, the proxy for TECH, is 0.92, it is likely that the correlation
coefficient for K and TECH is also positive. Moreover, since the technological sophisti-
cation of the firm (TECH ) should have positive impact on the output per worker (Y ),
it is reasonable to assume that β3 is positive. Therefore, if we assume that β3 > 0 and
γK,TECH > 0, then the bias of βˆ2 is positive, or βˆ2 is upward biased. That is, the partial
effect of K on Y is lower than 0.32, as estimated in the first specification.
If Y depends on both K and TECH, the second specification is correct and the parameter
estimates are unbiased. However, all the standard errors are very large, making the param-
eter estimates not statistically significant. A plausible explanation of the large standard
errors is that this regression is subject to severe multicollinearlity.
When marking:
• 0.5pt: identifying the omitted variable bias problem in the first specification.
• 1pt: sensible reasoning of the sources of the omitted variable bias (β3 and γK,TECH)
and their signs.
• 0.5pt: the conclusion of upward bias of βˆ2 and that the true partial effect of K on Y is
lower than 0.32.
• 0.5pt: mentioning the standard errors in the second regression are very large.
(2) If Y depends only on K, then the first specification is correct, the parameter estimates are
unbiased, and the standard errors are valid. The second specification, on the other hand,
includes a redundant variable R&D. The inclusion of this redundant variable gives rise to
efficiency loss, while does not affect the unbiasedness of the parameter estimates or the
validity of standard errors. Since the standard errors in both regressions are valid, they
can be compared. It is evident that the standard errors in the second regressions are a lot
higher, demonstrating the severe efficiency loss.
When marking:
7
• 0.5pt: mentioning the first specification is correct and parameter estimates are unbiased.
• 0.5pt: mentioning the redundant variable in the second specification.
• 1pt: correct statements about the consequences of including a redundant variable: pa-
rameter estimates are still unbiased, standard errors are valid, but there is an efficiency
loss.
• 0.5pt: mentioning that the standard errors in the second regression are indeed higher
than those in the first regression, demonstrating the efficiency loss.
essay、essay代写