PPGA503 HW4 prompt 2022
Due Nov 20 This assignment asks you to
begin analysis of health outcomes in Indonesia using real data from the
World Bank. The analysis is based on straightforward multiple
regression, though will also require some data management. It will
serve as the foundation for your fifth (and final!) homework assignment,
in which you will employ more advanced techniques to increase
robustness and generate more granular insights. The dataset, which
we’ve modified for the sake of accessibility, is from the World
Bank’s Indonesia
Database for Policy and Economic Research
(INDO-DAPOER). You can access the background material in the following
places: here and here. The dataset and a list of variables are
available on Canvas. The dataset includes a significant number of
variables covering health, education, governance, economic,
development, and natural resource attributes at the district level in
Indonesia. Note that Indonesia, aside from having the fourth largest
population in the world (spread across thousands of islands
stretching approximately 5k kilometers from west to east), is also
among the most decentralized countries in the world. This creates
immense variation in both governance and developmental outcomes
across the districts.1 There are slightly over 500 districts in one of
two types: kota (city) are more urbanized, while
kabupaten
(regency) are typically more rural. Provinces, of which there are just
under 40, form the meso (middle) tier of government. When you open the
dataset, you’ll immediately see that the variables have long and clunky
names. You’ll definitely want to do some renaming to make the relevant
variables easier to work with (you don’t need to rename all 150+
variables, of course!). You will also need to do some data management
in the form of creating new variables. We ask that you upload both
a writeup of your analysis (one question at a time) and the .do file
you used to create it. We want to understand variation in health
levels. The variable we will use to capture that is the Morbidity rate
(in %). This refers to the rate at which serious disease occurs in a
population. It is often used as a measure of the health of a
population, as well as to ascertain a population’s healthcare needs.
1. Let’s start by getting a clearer picture of variation in morbidity
rates across the districts. Specifically, please answer the question:
what does district level variation in morbidity rates look
like? In 4-5 sentences (written for an informed and well-trained
policy maker), please give a sense of district level variation in
morbidity rates using descriptive statistics. In addition, please
create a clean figure that visualizes the distribution of morbidity
rates and refer to it in your explanation. 2. Now we want to begin
to understand what may be driving this variation. You will create a
series of models, each a refinement of the previous. Please create a
regression table in which each estimation is a new model (refer back to
the slides to see what this should look like!). Let’s start with
demographic attributes, which will be model 1. Please estimate a
multiple regression with the following IVs (and remember that you
need to create some of the variables). After you estimate the
regression, please interpret the results in detail, as we have in class.
1 If you are really intrigued by the case and interested in
further reading, you could give this a try: “Indonesia’s
Decentralization Experiment: Motivations, Successes, and Unintended
Consequences” by Ostwald, Tajima, and Samphantharak (2016).
• proportion of the population over 65
•
poverty rate 3. We’ll now add in some heath infrastructure data.
Please include the following two variables in addition to the earlier
two. This is your model 2. As above, please interpret the results in
detail, as we have in class. Please look at multiple indicators,
thinking about whether the additional variables are generally useful.
• physician density (doctors per 1,000 population)
•
puskesmas density (community health centre per 1,000 population) 4.
Now we’ll add some additional infrastructure data to capture
both level of urbanization and transportation infrastructure. Please
keep the earlier variables in your model as well. This is model 3. As
before, please interpret the results in detail, looking at
multiple indicators and thinking about whether the additional
variables are generally useful.
• population density
• road
density (total length of roads divided by area) 5. Indonesia is
comprised of several major and many smaller islands. For historical
reasons, there are some key differences across these islands. We want to
make sure that we control for these. To do that, we need dummy
variables for the following islands: Java, Sumatra, Bali, Riau, and
Kalimantan. You’ll also want a variable for “other islands”. This will
take several steps, but fear not, there’s a reasonably easy way of doing
it. Do you see the province variable? Well… Java has only 6 provinces
(four plus the de facto provinces of Jakarta and Yogyakarta). Bali is a
single province, as is Riau. In short, you may not need more than around
25 lines of code in total. Once you’ve done that, please add controls
for districts within Java, Sumatra, Bali, Riau, and Kalimantan to the
previous model. The reference category will be “other islands”. This
will be model 4. As above, please interpret the findings carefully. Did
it help to add these controls?
Island Group Provinces
Java Jakarta, Yogyakarta, Central Java, East Java, West Java, Banten
Sumatra Aceh, North Sumatra, West Sumatra, Jambi, Bengkulu, South Sumatra, Lampung
Bali Bali
Riau Riau
Kalimantan
Central Kalimantan, North Kalimantan, South Kalimantan, East
Kalimantan, West Kalimantan Good luck and have fun !
留学生辅导