stata代写-6756BA/B
时间:2022-03-02
CUHK Business School
Causal Inference for the Impact of the Medicaid Program
DSME 6756BA/B: Business Intelligence Technologies and Applications
Due at 23:59PM, on Monday, March 7, 2022
Please complete the following task and submit (a) a brief PDF report articulating your approach
and results and (b) a Jupyter Notebook containing your analysis on Blackboard. We also design
several sub-questions which may guide your analysis. This project counts 10% towards the final
grade for this course, which means the two projects altogether count 22% and you have a 2%
extra-credit for the course projects. You are allowed to discuss with anyone about this project, but
you should perform the analysis and write the report on your own. Please make the PDF report,
without compromising on quality and clarity, as concise as possible.
Background
In 2008, a group of uninsured low-income adults in Oregon was selected by lottery to be given
the chance to apply for Medicaid. This lottery provides a unique opportunity to gauge the ef-
fects of expanding access to public health insurance on the health care use, financial strain, and
health of low-income adults using a randomized controlled design. The Oregon Health Insur-
ance Experiment followed and compared those selected in the lottery (treatments) with those
not selected (controls). You may visit this website https://www.nber.org/programs-projects/
projects-and-centers/oregon-health-insurance-experiment?page=1&perPage=50 for more
information about this experiment and the subsequent research and public policy that emerged
based on this experiment.
You job in this project is to estimate the causal effect of being selected by the lottery and
enrolling into the medicaid program on emergency department utilization. You may read Taubman
et al. (2014) for some background information (available in the Reference folder on GitHub).
Data
Please download the datasets and the relevant documentations from GitHub. The datasets are
stored in .dta format (the data format of Stata; so you can find some code to manipulate the data
and do the analysis in Stata on GitHub as well). You can load .dta data as a Pandas data frame
using the function pd.read_stata("oregonhie_descriptive_vars.dta").
Randomization and Treatment Assignment. Oregon selected roughly 30,000 individuals
by lottery from a waiting list of about 90,000 for an otherwise closed Medicaid program. The state
conducted eight lottery drawings from March through September 2008. Selected individuals won
the opportunity – for themselves and any household member – to APPLY for health insurance
benefits through a Medicaid program called Oregon Health Plan Standard (OHP Standard). OHP
Standard provides benefits to low-income adults who are not categorically eligible for Oregon’s
1
traditional Medicaid program (OHP Plus); to be eligible individuals must be adults ages 19 –
64, not otherwise eligible for Medicaid or other public insurance, Oregon residents, U.S. citizens
or legal immigrants, have been without health insurance for six months, have income below the
federal poverty level, and have assets below $2,000. The randomly selected individuals chosen by
the lottery who completed the application process and met the eligibility criteria were enrolled in
OHP Standard. Following some selection rules, the data set contains only these 74,922 individuals.
Of these individuals, 29,834 were selected as treatments (i.e. won the lottery and were given the
chance to apply for health insurance); treatment status is indicated by the variable treatment in
oregonhie_descriptive_vars.dta.
Crucially, the lottery selected individuals, but the opportunity to apply for health insurance
was extended to all household members of lottery winners: treatment selection is random
only conditional on the number of household members on the waiting list (this is given
by the variable numhh_list in oregonhie_descriptive_vars.dta. For example, an individual
could sign up his or herself as well as a spouse for the lottery, and both have equal probability of
being chosen. Thus, this person and his or her spouse are twice as likely to win the opportunity to
apply for health insurance as someone who only added their own name to the list, without adding
other household members. In short, those in a larger household are more likely to be selected into
the treatment condition.
Merging Datasets. All datasets contain observations at the individual level. Observations
can be linked across different .dta files by the unique identifier person_id, which appears in all
datasets. No other variable appears across multiple datasets.
Data Set Descriptions. Below we describe the 3 datasets concerned in this task:
oregonhie_descriptive_vars.dta
This data set contains demographic characteristics that were recorded when individuals signed
up for the lottery and lottery selection. You may refer to the code book
oregonhie_descriptivevars_codebook.pdf for descriptions of the variables in this data set.
oregonhie_stateprograms_vars.dta
This data set contains information from the state of Oregon on individuals’ participation in the
following state programs: Medicaid, the Supplemental Nutrition Assistance Program (SNAP), and
Temporary Assistance to Needy Families (TANF). You may refer to the code book
oregonhie_stateprograms_codebook.pdf for descriptions of the variables in this data set.
oregonhie_ed_vars.dta
This data set contains variables derived from administrative data of all visits to twelve hospital
emergency departments in the area of Portland, Oregon. You may refer to the code book
oregonhie_ed_codebook.pdf for descriptions of the variables in this data set.
2
For more detailed descriptions of the entire data set, please read the documents ohie_startguide.pdf
and ohie_userguide.pdf. All the data and their descriptions can be found here: https://www.
nber.org/research/data/oregon-health-insurance-experiment-data.
Questions
You job in this task is to estimate the causal effect of being selected by the lottery and en-
rolling into the medicaid program on emergency department utilization. Please address
the following questions. You may need to merge different data sets together.
(a) (3 points) Initial Data Pre-processing and Balance Check. Because the individuals
selected by the lottery have the opportunity to apply for the OHP Standard program, you
need to create dummy variables for the number of people in household on lottery list. Why
should we use dummy instead of numeric variables for this setting? Because the ED visit
data is only available for the Portland area, we will mainly work with the data observations
in this area. Please use the OLS approach to check the balance of the treatment and control
groups for the individuals in the data sample. Specifically, you need to regress the variable
which you want to conduct balance check on (i) the treatment variable, and (ii) the dummy
variables for the number of people in household on lottery list. Please conduct balance check
for the following variable with the full OHIE data sample (N=74,922):
• Included in the emergency department (ED) sample, i.e., the Portland area, (N=24,646).
Please conduct balance check for the following variable with the ED data sample (N=24,646):
• Year of birth
• Female
• Signed up self for lottery
• Any ED visit, pre-randomization (censored)
• Number of ED visits, pre-randomization (censored)
(b) (3 points) Causal Effect of Being Selected by Lottery. Next, for the data sample in the
Portland area (N=24,646), please estimate the causal effect of being selected by the lottery
on the following outcome:
• Whether an individual was enrolled in any Medicaid program (including the OHP Stan-
dard) between the earliest notification date in the sample (10 March 2008) and 30
September 2009.
Please include as appropriate necessary features into your regression model. In particular,
do we need to include the dummy variables for the number of people in household on lottery
list? Why or why not? Please discuss/justify your choice of features included in the regression
model. What is the average treatment effect of being selected by the lottery on being enrolled
in any Medicaid program?
(c) (4 points) Causal Effect of Enrolling into a Medicaid Program on ED Visits. Esti-
mate the average treatment effect of enrolling into a medicaid program on (i) the probability
3
of any ED visits during the study period and (ii) the (censored) number of ED visits in the
study period. Again, you need to include certain features into the regressions to remove
the bias and/or reduce the variance of your estimation. Please articulate your identification
strategy, present your model specification, and report your estimation results, including the
95% confidence intervals.
Hints
1. This project is essentially a replication of the paper Taubman et al. (2014).
2. If you are more familiar with R or Stata, feel free to use them to finish your analysis. In this
case, you also need to submit your code on Blackboard.
Reference
Taubman, S., H. Allen, B. Wright, K. Baicker, A. Finkelstein. 2014. Medicaid Increases Emergency-
Department Use: Evidence from Oregon’s Health Insurance Experiment. Science. 343, 263-268.
4


essay、essay代写