xuebaunion@vip.163.com
3551 Trousdale Rkwy, University Park, Los Angeles, CA
留学生论文指导和课程辅导
无忧GPA:https://www.essaygpa.com
工作时间:全年无休-早上8点到凌晨3点

微信客服:xiaoxionga100

微信客服:ITCS521
ECON5121: Microeconometrics: Impact Evaluation and Causal Analysis Fall Semester, 2021 Unit 3 – Sample selection models Oct. 12, 2021 1Unit 3 -Sample Selection Models Sample Selection • Now turn to the problem of using only a subset of a random sample obtained from a well-defined population (presumably, the one of interest) • Obvious but important point: There cannot be an issue of non-random sample selection if a random sample has been obtained from a given population. The population is not immutable. We can choose a population of interest from a bigger population • For example, if we are interested in the effect of a job training program on a population of men with poor labor market histories, we can define the population based on observed past labor market outcomes, such as unemployment status or labor earnings. If we can collect a random sample from the defined population, we just apply standard methods. • Sample selection becomes an issue when the sample we can obtain are not representative of the population of interest Oct. 12, 2021 Unit 3 -Sample Selection Models 2 Sample Selection (cont.) • As an example, suppose we are interested in a wealth equation, wealth = 0 + 1 educ + 2age + 3income + u which describes the population of all families in the United States (where educ and age are for the self-described “household head”). If we were to assume that u has zero mean and is uncorrelated with each explanatory variable, then we would use OLS if we have a random sample from the population. • Suppose, though, that only people less than 65 years old were sampled. What if we use OLS on the selected sample? Oct. 12, 2021 Unit 3 -Sample Selection Models 3 Sample Selection (cont.) • As we will see, OLS on the nonrandom sample nevertheless consistently estimates the provided E(u| educ, age, income)=0 • Zero correlation is not enough! Must have the conditional mean correctly specified • Next suppose that only families with wealth greater than zero are included in the sample. Now, the data are selected on the basis of the response variable, wealth. As we will see, using standard methods (including OLS) on such as sample leads to biased and inconsistent estimators of the , even under the zero conditional mean assumption Oct. 12, 2021 Unit 3 -Sample Selection Models 4 Sample Selection (cont.) • A different setup is when sample selection is not a deterministic function of either the or , but it may be related to them. This includes the problem of missing data, where data are missing on one or more elements of (, ) for some units drawn randomly from the population • Another example is when is observed only when a certain event is true. A leading example is when is the log of the “wage offer” – the hourly wage someone could get paid if in the work force • we observe the log wage only if the person decides to enter the work force Oct. 12, 2021 Unit 3 -Sample Selection Models 5 Sample Selection (cont.) • Generally called the problem of incidental truncation • The hallmark of the incidental truncation problem is the notion of “self-selection.” For example, we only observe the wage offer if the person “self-selects” into the workforce • Whether someone chooses to report, say, their annual income has a self-selection component Oct. 12, 2021 Unit 3 -Sample Selection Models 6 When can sample selection be ignored? • We need to know about the relation of the sample selection mechanism with (, ) • In statistics, if selection is independent of (, ), then the data are said to be missing completely at random • Another sufficient condition for consistent estimation is | = = 0. Hence, there is no correlation between selection and • Another sufficient condition for consistent estimation is that |, = | = , which means selection can be an arbitrary function of the exogenous variables • Generally, though, linear projections are not consistently estimated using a selected sample when selection is a function of . In other words, even with exogenous sampling we must use a conditional mean assumption in the underlying population Oct. 12, 2021 Unit 3 -Sample Selection Models 7 When can sample selection be ignored? • If = + , ′ = 0, and selection is independent of (, ), then OLS using the selected sample is consistent for • The case with exogenous is very important for sample selection corrections. If we can obtain an equation where the selection indicator is a function of the explanatory variables, we can apply OLS to that equation for consistent estimation Oct. 12, 2021 Unit 3 -Sample Selection Models 8 When can sample selection be ignored? • Application of previous results. Suppose the population model is = + = 0 and , a selection indicator is correlated with . But suppose s is a determinstic function of and , for a variable . Further, suppose (, ) is independent of . Then , = + , = + where the last equality follows by the independence assumption Oct. 12, 2021 Unit 3 -Sample Selection Models 9 When can sample selection be ignored? • Suppose also has zero mean, and = Then , = + • Now, because is a function of (, ), we can use OLS of on , using the selected sample ( = 1) to consistently estimate and • Notice that all variables, including , only need to be observed when = 1 Oct. 12, 2021 Unit 3 -Sample Selection Models 10 When can sample selection be ignored? • In effect, controlling for in the regression on the selected sample solves the sample selection problem. We will use this result later. • In practice, depends on unknown parameters that have to be estimated in a first stage Oct. 12, 2021 Unit 3 -Sample Selection Models 11 Selection on the Response Variable: Truncated Regression • Now consider the case where the rule for observing a data point depends in a known, deterministic way on the response variable. Start with the premise we are interested in the distribution in a given population • For simplicity, assume has a continuous distribution. Let denote a random draw from the population, but where we only observe (or, at least, we only use) the data point if = 1 • Assume the rule is that, for known constants 1 and 2, = 1[1 < < 2] Oct. 12, 2021 Unit 3 -Sample Selection Models 12 Selection on the Response Variable: Truncated Regression • Allow for the cases 1 = −∞ and 2 = +∞ • While the analysis can be made much more general, assume we are primarily interested in = . But now using OLS on the selected sample, because selection is a function of , results in an inconsistent estimator of . • In a parametric context, assume that the population conditional density is ;, Oct. 12, 2021 Unit 3 -Sample Selection Models 13 Selection on the Response Variable: Truncated Regression • The density conditional on = 1 is , = 1 = ;,(1 < < 2|) = ;, 2 ;, − 1 ;, where � ;, is the cumulative distribution function of � ;, • Having derived the density , = 1 one can form the likelihood function and using maximum likelihood estimation Oct. 12, 2021 Unit 3 -Sample Selection Models 14 Selection on the Response Variable: Truncated Regression • When is normal, this is often called the “truncated Tobit model,” but a better name is the truncated normal regression model • As with censoring, truncated the sample is costly. We are interested in = in the entire population, but because of the truncated sampling, we specify all of • Differs from the censored normal regression model in that we observe no information on units not in the subpopulation with 1 < < 2. In the censored case, we have a random sample of units, which means we observe for the whole population, and we can use that in estimation Oct. 12, 2021 Unit 3 -Sample Selection Models 15 Selection on the Response Variable: Truncated Regression • The censored likelihood function uses additional information in the form of the model for the binary selection indicator, ( uncensored or not), which depends on the parameters and . (Remember, we are not specifying a separate model for ; it is implied by the underlying classical linear model and the censoring.) We can use this information in the censored case because we observe even when = 0 • In the truncated case, we do not observe this information. • The same can be shown in the general case with other forms of censoring and other distributions. • If you have a choice, you should use censored regression, not truncated regression Oct. 12, 2021 Unit 3 -Sample Selection Models 16 Selection on the Response Variable: Truncated Regression • Suppose we only observe a unit if < 50. In Stata, we use the command truncreg 1 ... , ul(50) • Again, we interpret the results as if we had run a linear regression using a random sample from the entire population. This is much different from applying Tobit to a corner solution. • Easy to extend to case where limits change with i, so (1,2). Must assume ; 1,2 = which is always true if 1 and 2 are deterministic functions of ( ,) Oct. 12, 2021 Unit 3 -Sample Selection Models 17 Selection on the Response Variable: Truncated Regression Oct. 12, 2021 Unit 3 -Sample Selection Models 18 Incidental Truncation: A Probit Selection Equation • Motivation: Interested in estimating , where is the wage offer. But need to recognize that if we randomly sample adults, some will not be working, so is unobserved • Assume the person works only if > . • This looks like censoring the wage offer from below, but there is a key difference: we do not observe . This is called incidental truncation (perhaps “incidental censoring” would be a better name, as we can generally draw a random sample from the population of working-age adults, and then observe other attributes) Oct. 12, 2021 Unit 3 -Sample Selection Models 19 Incidental Truncation: A Probit Selection Equation • We observe if + > 0 • General model 1 = + 2 = + where 1 is the response that is only partially observed, and now 2 is the selection indicator • Assumptions: (a) , 2 are always observed, 1 is observed only when 2 = 1; (b) , are independent of , with zero mean; (c) ~Normal(0,1); (d) = • So, we can think of a random draw (, ,1,2) from the population, but we only observe 1 if 2 = 1 Oct. 12, 2021 Unit 3 -Sample Selection Models 20 Incidental Truncation: A Probit Selection Equation • This is sometimes called the Type II Tobit model, but it is important to recognized it as a sample selection model. Not surprisingly, it has some statistical similarities with the “selection model” for corner solutions we discussed previously. But it does not make sense to set 1 = 0, say, just because we do not observe it. (In the wage offer example, it means we set equal to 0 whenever we do not observe it.) • Contrast the sample selection setup with the case of charitable contributions. In that case it makes sense to have 1 = 0 when 2 = 0 Oct. 12, 2021 Unit 3 -Sample Selection Models 21 Incidental Truncation: A Probit Selection Equation • Joint normality of , is not necessary for a two-step estimation method, but it is often imposed for a (partial) MLE analysis • Because is independent of and standard normal, 2 follows a probit: (2= 1) = Φ / • Because , 2 are assumed to always be observed, is identified, and so we can treat it as known for the purposes of deriving an estimating equation for Oct. 12, 2021 Unit 3 -Sample Selection Models 22 Incidental Truncation: A Probit Selection Equation • How can we obtain an estimating equation for ? Under the previous assumptions, (1 , = + , = + , = + • If we could observe (or, in effect, estimate) , we could solve the selection problem by adding as a regressor and using OLS on the selected sample. Oct. 12, 2021 Unit 3 -Sample Selection Models 23 Incidental Truncation: A Probit Selection Equation • But we only observe 2 = 1[ + > 0]. So, we need to obtain (1|,2). But (,,2 ) is a function of (,, ), so we can apply iterated expectations: 1 ,,2 = (1| ,,2) ,,2 = + (| ,2) • When 2 = 1[ + > 0] and ~Normal(0,1), (|,2) has a well-known form: it is the inverse Mills ratio • Therefore, on the selected sample we have 1 ,,2 = 1 = + Oct. 12, 2021 Unit 3 -Sample Selection Models 24 Incidental Truncation: A Probit Selection Equation • If we just regress 1 on using the 2 = 1 sample, then, in effect, we omit the variable from the regression. (It is possible that, in the subpopulation with 2 = 1 , is uncorrelated with , in which case OLS would be consistent for the slopes in . But this would be a fluke and cannot be relied on) • The equation for 1 ,,2 = 1 is properly viewed as an estimating equation, not a model that we begin with • For the identification of , it is important that contains at least one element that is not in Oct. 12, 2021 Unit 3 -Sample Selection Models 25 Incidental Truncation: A Probit Selection Equation • The expression for 1 ,,2 = 1 suggests a simple two-step estimation method. (i) Estimate probit of 2 on using all of the data, to obtain the estimates of and the IMR (ii) Run OLS of 1 on , and the IMR in the selected subsample • This has been called the Heckit method after Heckman (1976) Oct. 12, 2021 Unit 3 -Sample Selection Models 26 Incidental Truncation: A Probit Selection Equation • When we write the equations for 1 and 2, we call the first equation the “regression equation” and the second the “selection equation” • We are using this procedure to solve a missing data problem, or a sample selection problem. Thus, we are interested in estimating . In the case of censored models, the partial effects we want are much more complicated • Should adjust our standard errors and inference for two-step estimation. Many packages, including Stata, make the adjustment routinely. Bootstrapping is also valid Oct. 12, 2021 Unit 3 -Sample Selection Models 27 Incidental Truncation: A Probit Selection Equation • Technically, the procedure goes through with = , that is, without an exclusion restriction. But then identification of is possible only because the IMR is a nonlinear function • Generally, should be hesitant to achieve identification “off of a nonlinearity.” Cannot really tell if the IMR is statistically significant because selection is an issue or the functional form 1 = is mis-specified (in the population) Oct. 12, 2021 Unit 3 -Sample Selection Models 28 Incidental Truncation: A Probit Selection Equation • We are assuming 1 (the population regression) does not depend on . The only reason 1 depends on is because predicts selection and selection is correlated with • Often, over the range of in the data, the IMR is pretty close to linear • Very high collinearity is usually present unless contains something not in that is useful for predicting selection Oct. 12, 2021 Unit 3 -Sample Selection Models 29 Incidental Truncation: A Probit Selection Equation • If we assume (, ) is bivariate normal, then we can apply partial MLE. It is “partial” because we can only use 1 when 2 = 1. See Wooldridge (2010) for log likelihood function. The MLE is more efficient if joint normality holds, and the standard errors are readily available • On the other hand, Heckit is more robust because it does not assume joint normality. Only is assumed to be normal, and also = Oct. 12, 2021 Unit 3 -Sample Selection Models 30 Heckit example Oct. 12, 2021 Unit 3 -Sample Selection Models 31 sigma .66416489 rho -0.07626 lambda -.0506524 .1773127 -0.29 0.775 -.398179 .2968741 /mills _cons .4007683 .50461 0.79 0.427 -.5882492 1.389786 kidsge6 .0305573 .0434229 0.70 0.482 -.0545499 .1156645 kidslt6 -.8597359 .1174379 -7.32 0.000 -1.08991 -.6295618 c.exper#c.exper -.001843 .0005967 -3.09 0.002 -.0030125 -.0006736 exper .1259602 .0186456 6.76 0.000 .0894154 .1625049 educ .1098675 .0236192 4.65 0.000 .0635747 .1561603 age -.05629 .0083529 -6.74 0.000 -.0726614 -.0399186 inlf _cons -.4892416 .3166662 -1.54 0.122 -1.109896 .1314127 c.exper#c.exper -.0007546 .0004504 -1.68 0.094 -.0016374 .0001282 exper .0378745 .0184064 2.06 0.040 .0017986 .0739505 educ .1052471 .0161963 6.50 0.000 .073503 .1369912 age .0013706 .0061476 0.22 0.824 -.0106784 .0134196 lwage Coef. Std. Err. z P>|z| [95% Conf. Interval] Prob > chi2 = 0.0000 Wald chi2(4) = 47.79 Nonselected = 325 (regression model with sample selection) Selected = 428 Heckman selection model -- two-step estimates Number of obs = 753 . heckman lwage c.age c.educ c.exper c.exper#c.exper, select(inlf = c.age c.educ c.exper c.exper#c.exper c.kidslt6 c.kidsge6) twostep MLE example Oct. 12, 2021 Unit 3 -Sample Selection Models 32 LR test of indep. eqns. (rho = 0): chi2(1) = 0.03 Prob > chi2 = 0.8545 lambda -.0211442 .116862 -.2501895 .2079011 sigma .6634476 .0227482 .6203271 .7095655 rho -.0318702 .176053 -.3603511 .3036427 /lnsigma -.4103054 .0342879 -11.97 0.000 -.4775084 -.3431024 /athrho -.031881 .176232 -0.18 0.856 -.3772893 .3135274 _cons .4027357 .5048279 0.80 0.425 -.5867089 1.39218 kidsge6 .0308315 .0434492 0.71 0.478 -.0543274 .1159904 kidslt6 -.8604047 .1174637 -7.32 0.000 -1.090629 -.6301801 c.exper#c.exper -.0018445 .0005963 -3.09 0.002 -.0030132 -.0006759 exper .1259921 .0186413 6.76 0.000 .0894557 .1625284 educ .109568 .02366 4.63 0.000 .0631953 .1559407 age -.0562597 .0083552 -6.73 0.000 -.0726356 -.0398839 inlf _cons -.5149459 .2944418 -1.75 0.080 -1.092041 .0621494 c.exper#c.exper -.0007899 .000421 -1.88 0.061 -.0016151 .0000352 exper .0400235 .0156297 2.56 0.010 .0093899 .0706571 educ .1065703 .015049 7.08 0.000 .0770748 .1360659 age .0007379 .0054425 0.14 0.892 -.0099292 .011405 lwage Coef. Std. Err. z P>|z| [95% Conf. Interval] Log likelihood = -836.0265 Prob > chi2 = 0.0000 Wald chi2(4) = 56.08 Nonselected = 325 (regression model with sample selection) Selected = 428 Heckman selection model Number of obs = 753 Iteration 3: log likelihood = -836.02653 Iteration 2: log likelihood = -836.02653 Iteration 1: log likelihood = -836.02682 Iteration 0: log likelihood = -836.05814 . heckman lwage c.age c.educ c.exper c.exper#c.exper, select(inlf = c.age c.educ c.exper c.exper#c.exper c.kidslt6 c.kidsge6) Panel data Oct. 12, 2021 Unit 3 -Sample Selection Models 33 • Nonlinear model, hence unobserved heterogeneity cannot be eliminated by demeaning or first differencing • Random effects: write density conditional on the unobserved heterogeneity |, and then integrate out by using a distributional assumption • use Stata command xtheckman • Fixed effects: use the Mundlak term by expressing unobserved heterogeneity as a function of the mean of time-varying regressors – this can be done in both the selection and the regression equation Readings Oct. 12, 2021 Unit 3 -Sample Selection Models 34 • Wooldridge (2010), Chapter 19 (except sections referring to the treatment of endogeneity), • Hansen (2021), Chapter 27, Sections 27.9-27.10