stata代写-KT501-Assignment 2|学霸联盟

stata代写-KT501-Assignment 2

时间：2022-01-21

KT501 Cross Sectional and Panel Data Analysis
Assignment 2, 2021 – Answer Key
Question 1 (12 marks)
This question refers to the paper of Caskey, J. P., & Peterson, A. (1994). Who has a bank account and who
doesn’t: 1977 and 1989. Eastern Economic Journal, 20(1), 61-73. The paper can be found in the folder of \\iuj-
home\IR materials\Wong\CPDA\assignment.
(a) (2 marks) What is the research question of this paper?
The authors examined the how did the changes in the socioeconomic characteristics of households affect
the ownership of bank accounts in between 1977 and 1989 in the US.
(b) (3 marks) Why the probit model is suitable for this study? Briefly describe the dataset used in this
paper.
Please refer to p.62 - p.64.
(c) (3 marks) In Table 5 on page 68, what does “Implied slope” mean?
On p. 67, the authors mentioned that “The reported slopes in Table 5 measure the implied effect of a one-
unit change in the relevant independent variable on the probability that a household has a bank account,
where these marginal effects are calculated at the means of the right-hand side variables”, i.e., the implied
slope is the partial effect at the average (PEA) of a particular independent variable (treating that
independent variable as discrete variable).
(d) (4 marks) For 1977, what is the effect of marital status on ownership of a bank account? And for 1989?
What is the reason for this difference suggested by the authors?
The estimated coefficient of “Married” is insignificant in 1977 but statistically significant at 10% level in
1989. The PEA of “Married” are 0.008 and 0.023 in 1977 and 1989 respectively. The results imply that
compared to the “not married” people, the predicted probabilities of married people who had bank accounts
are 0.8% and 2.3% higher in 1977 and 1989 respectively. On p.66, the authors mentioned that “… some of
the variables, …, are correlated with account ownership primarily because they are linked to family wealth.”
The marital status in 1989 may be correlated with family wealth and hence with positive effect on bank
account ownership.
Do
n
o
co
py
Question 2 (13 marks)
(a) (2 marks) Estimate equation (1) using fixed effects (FE) and report the results.
(b) (3 marks) How many individuals are used in the FE estimation? How many waves were included in the
panel data? How many total observations would be used if each individual had data on all variables for
all the waves?
No. of individuals=3,753; No. of waves=8; Total obs. for balanced panel=3753x8=30,024
(c) (2 marks) Would you choose dummy variable regression method to estimate equation (1)? Why?
No, because the number of cross-sectional units is too large (no. of individuals=3,753).
(d) (3 marks) Interpret the coefficient on union and comment on its significance.
The coefficient estimate of is 0.0803 which implies that comparing to the women who did not join the
union, the women who joined the union earn 8.03% more. This effect is statistically significant at 0.1% level
since p-value=0.000.
. xtreg ln_wage union age agesq ttl_exp tenure, re
Random-effects GLS regression Number of obs = 15,044
Group variable: idcode Number of groups = 3,753
R-sq: Obs per group:
within = 0.1190 min = 1
between = 0.2538 avg = 4.0
overall = 0.1999 max = 8
Wald chi2(5) = 2730.72
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
union | .108287 .0073726 14.69 0.000 .093837 .122737
age | .0112945 .005664 1.99 0.046 .0001932 .0223959
agesq | -.0004251 .0000834 -5.10 0.000 -.0005885 -.0002616
ttl_exp | .0422961 .0014509 29.15 0.000 .0394523 .0451398
tenure | .0099027 .0009365 10.57 0.000 .0080672 .0117383
_cons | 1.444054 .0951115 15.18 0.000 1.257639 1.630469
-------------+----------------------------------------------------------------
sigma_u | .35413339
sigma_e | .24517153
rho | .67599596 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Do
n
o
co
py
(e) (3 marks) Estimate equation (1) using random effects (RE) and report the results. Conduct the Hausman
test to compare the FE model with the RE model and report the test results. Based on the test results, would
you recommend to use fixed-effects or random-effects estimation? Why?
The chi-square statistics from the Hausman test is 182.44 with p-value equals 0.0000 and hence H0 is rejected.
The test results imply that the key RE assumption of“ and ’ are not correlated” is false, and hence the FE
estimates are preferred.
. xtreg ln_wage union age agesq ttl_exp tenure, fe
Fixed-effects (within) regression Number of obs = 15,044
Group variable: idcode Number of groups = 3,753
R-sq: Obs per group:
within = 0.1198 min = 1
between = 0.2420 avg = 4.0
overall = 0.1928 max = 8
F(5,11286) = 307.19
corr(u_i, Xb) = 0.0621 Prob > F = 0.0000
------------------------------------------------------------------------------
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
union | .0803123 .0080053 10.03 0.000 .0646205 .0960041
age | .0130806 .0059647 2.19 0.028 .0013888 .0247724
agesq | -.0004755 .0000866 -5.49 0.000 -.0006452 -.0003058
ttl_exp | .0431125 .0026202 16.45 0.000 .0379764 .0482486
tenure | .0076494 .001033 7.41 0.000 .0056245 .0096742
_cons | 1.453375 .102555 14.17 0.000 1.252349 1.654401
-------------+----------------------------------------------------------------
sigma_u | .39701415
sigma_e | .24517153
rho | .72392748 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(3752, 11286) = 9.30 Prob > F = 0.0000
. hausman fixed random
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| fixed random Difference S.E.
-------------+----------------------------------------------------------------
union | .0803123 .108287 -.0279747 .0031193
age | .0130806 .0112945 .001786 .0018698
agesq | -.0004755 -.0004251 -.0000504 .0000232
ttl_exp | .0431125 .0422961 .0008164 .0021819
tenure | .0076494 .0099027 -.0022534 .0004359
------------------------------------------------------------------------------
b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test: Ho: difference in coefficients not systematic
chi2(5) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 182.44
Prob>chi2 = 0.0000
Do

o
co
py
Question 3 (Stata exercise) (13 marks)
(a) (1 mark) What fraction of the individuals in the sample are eligible for participation in a 401(k) plan?
=1 if |
eligble for |
401(k) | Freq. Percent Cum.
------------+-----------------------------------
0 | 5,638 60.79 60.79
1 | 3,637 39.21 100.00
------------+-----------------------------------
Total | 9,275 100.00

The fraction of eligible individuals is 39.21%.
(b) (4 marks) Should you use the heteroskedasticity-robust standard errors to estimate the LPM? Why?
Estimate the linear probability model (LPM) of equation (1) and report the estimated coefficients with
(appropriate) standard errors and other important statistics. Interpret the estimate of 3 and comment on
it’s significance.
Yes, we should use heteroskedasticity-robust standard errors. When the dependent variable is a binary
variable, its variance conditional on , is P()[1 − P()]. Therefore, unless the probability does not depend on
any of the independent variables, there must be heteroskedasticity in the LPM and hence we should use the
heteroskedasticity-robust standard errors.
The estimate of 3 is -0.022 with p-value less than 10%. The results indicate that compared to females, on
average the probability of males being eligible for e401k is 2.2% lower and this effect is statistically
significant at 10% level.
. reg e401k age agesq male fsize inc incsq, robust

Linear regression Number of obs = 9,275
F(6, 9268) = 179.92
Prob > F = 0.0000
R-squared = 0.0962
Root MSE = .46431

------------------------------------------------------------------------------
| Robust
e401k | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .02988 .0038892 7.68 0.000 .0222563 .0375038
agesq | -.0003464 .0000446 -7.76 0.000 -.0004338 -.0002589
male | -.0220944 .0126829 -1.74 0.082 -.0469557 .0027668
fsize | -.015407 .0033834 -4.55 0.000 -.0220393 -.0087747
inc | .0126414 .0006001 21.06 0.000 .011465 .0138177
incsq | -.0000627 5.00e-06 -12.55 0.000 -.0000725 -.0000529
_cons | -.5282495 .0785802 -6.72 0.000 -.682284 -.3742151
------------------------------------------------------------------------------
Do
n
ot
co
py
(c) (5 marks) Estimate equation (1) by probit and logit also. Report the estimated coefficients with standard
errors and other important statistics from probit and logit. Why are the coefficient estimates from LPM,
probit and logit different from each other? Why can’t you interpret the coefficient estimates from probit
and logit models as the partial effects of the independent variables?
The estimates from LPM are different from probit and logit because LPM is a linear model while probit and
logit are nonlinear models. The estimates from probit and logit models are different because the logit model
uses the cumulative distribution function of the logistic distribution and the probit model uses the cumulative
distribution function of the standard normal distribution.
The coefficient estimates reported after the estimation of probit or logit models are the effect of the
independent variable ’s on the latent variable ∗. But the latent variable ∗ is only the underlying latent
propensity of occurrence of the outcome equals one and it rarely has well-defined unit of measurement. The
sign of the coefficient can give the direction of the effect of an independent variable , but it is not the partial
effect of the independent variable on the probability of success P( = 1).

. probit e401k age agesq male fsize inc incsq

Iteration 0: log likelihood = -6211.3846
Iteration 1: log likelihood = -5751.4913
Iteration 2: log likelihood = -5749.8658
Iteration 3: log likelihood = -5749.8657

Probit regression Number of obs = 9,275
LR chi2(6) = 923.04
Prob > chi2 = 0.0000
Log likelihood = -5749.8657 Pseudo R2 = 0.0743

------------------------------------------------------------------------------
e401k | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0861003 .0114578 7.51 0.000 .0636434 .1085573
agesq | -.0009991 .0001319 -7.57 0.000 -.0012577 -.0007405
male | -.0654625 .0362208 -1.81 0.071 -.136454 .005529
fsize | -.0451931 .0098575 -4.58 0.000 -.0645134 -.0258727
inc | .035228 .0017015 20.70 0.000 .0318931 .038563
incsq | -.0001783 .0000133 -13.36 0.000 -.0002045 -.0001522
_cons | -2.903148 .2343165 -12.39 0.000 -3.3624 -2.443896
------------------------------------------------------------------------------

. logit e401k age agesq male fsize inc incsq

Iteration 0: log likelihood = -6211.3846
Iteration 1: log likelihood = -5755.1554
Iteration 2: log likelihood = -5751.7145
Iteration 3: log likelihood = -5751.7124
Iteration 4: log likelihood = -5751.7124

Logistic regression Number of obs = 9,275
LR chi2(6) = 919.34
Prob > chi2 = 0.0000
Log likelihood = -5751.7124 Pseudo R2 = 0.0740

------------------------------------------------------------------------------
e401k | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .1423354 .0189958 7.49 0.000 .1051044 .1795665
agesq | -.0016509 .0002187 -7.55 0.000 -.0020796 -.0012222
male | -.1051015 .0597047 -1.76 0.078 -.2221206 .0119176
fsize | -.0744097 .0163817 -4.54 0.000 -.1065171 -.0423022
inc | .0574525 .0028693 20.02 0.000 .0518287 .0630763
incsq | -.0002921 .0000226 -12.92 0.000 -.0003364 -.0002477
_cons | -4.768696 .3899412 -12.23 0.000 -5.532967 -4.004426
------------------------------------------------------------------------------
Do
no
t
co
py
(d) (3 marks) Calculate the average partial effects of family size () for = 1, 2, 3, … ,8 using the
results from the logit model. Interpret the average partial effect of family size at = 4. Why the
average partial effect of family size is not a constant?

At family size ()=4, the average partial effect of family size is -0.0157. The result indicates that when
the family size is 4, an additional one person increase in family size will lead to the probability of being
eligible to the 401(k) plan on average decreases by 1.57% and this effect is statistically significant sine the p-
value is less than 5%.
The average partial effect of family size is not a constant because for the logit model, the average partial effect
does not only depends on the coefficient estimated form the model but all the values of all the independent
variables (including ).

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
fsize |
_at |
1 | -.0163125 .0036329 -4.49 0.000 -.0234328 -.0091921
2 | -.0161405 .0035672 -4.52 0.000 -.0231321 -.0091489
3 | -.0159391 .003482 -4.58 0.000 -.0227638 -.0091145
4 | -.0157097 .0033785 -4.65 0.000 -.0223315 -.009088
5 | -.0154538 .003258 -4.74 0.000 -.0218393 -.0090682
6 | -.0151729 .003122 -4.86 0.000 -.0212918 -.0090539
7 | -.0148689 .002972 -5.00 0.000 -.0206939 -.0090438
8 | -.0145436 .0028099 -5.18 0.000 -.0200508 -.0090363
------------------------------------------------------------------------------
Do
n
ot
c
py
Question 4 (Stata exercise) (12 marks)
(a) (3 marks) In the sample, for what percentage of the workers in the sample is pension equal to zero? What
is the range of pension for workers with nonzero pension benefits? Why is a Tobit model appropriate for
modeling pension?

There are 616 workers in the sample and 27.9% of them (172 individuals) have zero pension benefits. For the
444 workers with positive pension benefits, the range of pension is from $7.28 to $2,880.27. There is a nontrivial
fraction of the sample with pension equals to zero, and the positive pension takes on a wide range of values.
Therefore, Tobit model is appropriate for modelling pension benefits.
(b) (3 marks) Estimate a Tobit model explaining pension in terms of exper, age, tenure, educ, depends, married,
white, and male. Report the estimation results. Do you think the males have higher expected pension
benefits than the females? Why?
The regression results indicate that being a male increases the predicted pension benefits compared to female
and the effect is statistically significant (p-value<0.01%).
[Note: be careful that the coefficient is NOT partial effect of on expected pension benefit. You can only
tell the direction of the effects but cannot tell by how much it can affect the actually outcome.]
. /* gen dummy variable for pospension, =1 if pension>0, =0 otherwise */
. /* pospension : pension with positive values */
. gen pospension = pension>0

. tab pospension
------------------------------------------------
Pospension | Freq. Percent Cum.
------------+-----------------------------------
0 | 172 27.92 27.92
1 | 444 72.08 100.00
------------+-----------------------------------
Total | 616 100.00
------------------------------------------------

. sum pension if pension >0
---------------------------------------------------------------------------------------------
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
pension | 444 905.0439 550.3696 7.28 2880.27
---------------------------------------------------------------------------------------------
. tobit pension exper age tenure educ depends married white male, ll(0)

--------------------------------------------------------------------------------
Tobit regression Number of obs = 616
Uncensored = 444
Limits: lower = 0 Left-censored = 172
upper = +inf Right-censored = 0

LR chi2(8) = 184.70
Prob > chi2 = 0.0000
Log likelihood = -3672.9635 Pseudo R2 = 0.0245

--------------------------------------------------------------------------------
pension | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
exper | 5.203458 6.009514 0.87 0.387 -6.598467 17.00538
age | -4.638944 5.710964 -0.81 0.417 -15.85455 6.576666
tenure | 36.02385 4.564528 7.89 0.000 27.05969 44.988
educ | 93.21262 10.89176 8.56 0.000 71.82258 114.6027
depends | 35.28461 21.91775 1.61 0.108 -7.759075 78.3283
married | 53.68858 71.7354 0.75 0.454 -87.19067 194.5678
white | 144.0855 102.0792 1.41 0.159 -56.3851 344.5562
male | 308.1505 69.89297 4.41 0.000 170.8895 445.4114
_cons | -1252.429 219.0781 -5.72 0.000 -1682.67 -822.1873
---------------+----------------------------------------------------------------
var(e.pension)| 459329.1 32721.65 399360.6 528302.6
--------------------------------------------------------------------------------
Do
n
c
py
(c) (3 marks) Use the results from part (b) to estimate the average partial effects of educ at educ equals 4 to 19
(using 5 years as the interval) on the possible expected values of pension (i.e., ≥ 0). Interpret the
average partial effect of educ at educ=14.
At =12, the average partial effects of educ is 68.01 which is statistically significant (p-value<0.001). The
results indicate that for people with 12 years of education, having one more year of education on average is
going to increase their pension benefits by 68.01 dollars and this effect is statistically significant.
(d) (3 marks) If you re-estimate the model by treating it as a linear model and using OLS, will you find the
same partial effects as you reported in part (c)? Why?
In this analysis, 27.9% of the workers in the sample have zero pension benefit, i.e., nontrivial fraction of
observations have the values of truncated at zero. Since we cannot observe the negative values of the
latent variable of , if we use OLS to regress the observable on the independent variables, the
estimates we got are biased. However, the tobit model applies the latent variable framework to deal with the
dependent variable with the values of zeros and hence the estimates are consistent. Therefore, the partial effects
of the independent variables obtained from the OLS estimation and the tobit model are different.
The regression results from OLS estimation are shown below for reference (not required in the answer):

. margins, dydx(educ) predict(ystar(0,.)) at(educ=(4(4)19))

Average marginal effects Number of obs = 616
Model VCE : OIM

Expression : E(pension*|pension>0), predict(ystar(0,.))
dy/dx w.r.t. : educ
1._at : educ = 4
2._at : educ = 8
3._at : educ = 12
4._at : educ = 16
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ |
_at |
1 | 33.57479 1.320379 25.43 0.000 30.98689 36.16269
2 | 51.21524 3.777809 13.56 0.000 43.81087 58.61961
3 | 68.01 7.676695 8.86 0.000 52.96396 83.05605
4 | 80.65916 10.35261 7.79 0.000 60.36842 100.9499
------------------------------------------------------------------------------
. reg pension exper age tenure educ depends married white male

Source | SS df MS Number of obs = 616
-------------+---------------------------------- F(8, 607) = 29.03
Model | 65233955.7 8 8154244.46 Prob > F = 0.0000
Residual | 170501392 607 280891.914 R-squared = 0.2767
-------------+---------------------------------- Adj R-squared = 0.2672
Total | 235735348 615 383309.508 Root MSE = 529.99
------------------------------------------------------------------------------
pension | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
exper | 3.824058 4.319262 0.89 0.376 -4.658454 12.30657
age | -2.910444 4.084039 -0.71 0.476 -10.93101 5.110117
tenure | 27.23766 3.437175 7.92 0.000 20.48746 33.98786
educ | 70.54624 8.079536 8.73 0.000 54.679 86.41348
depends | 35.10034 16.33034 2.15 0.032 3.029513 67.17116
married | 13.90625 53.28493 0.26 0.794 -90.73895 118.5515
white | 114.9365 75.07571 1.53 0.126 -32.50321 262.3762
male | 272.9529 51.97504 5.25 0.000 170.8801 375.0256
_cons | -735.0276 159.8468 -4.60 0.000 -1048.947 -421.1078
------------------------------------------------------------------------------ Do
n
c
py