PPGA503-无代写

时间：2022-11-20

PPGA503 HW4 prompt 2022 Due Nov 20 This assignment asks you to begin analysis of health outcomes in Indonesia using real data from the World Bank. The analysis is based on straightforward multiple regression, though will also require some data management. It will serve as the foundation for your fifth (and final!) homework assignment, in which you will employ more advanced techniques to increase robustness and generate more granular insights. The dataset, which we’ve modified for the sake of accessibility, is from the World Bank’s Indonesia
Database for Policy and Economic Research (INDO-DAPOER). You can access the background material in the following places: here and here. The dataset and a list of variables are available on Canvas. The dataset includes a significant number of variables covering health, education, governance, economic, development, and natural resource attributes at the district level in Indonesia. Note that Indonesia, aside from having the fourth largest population in the world (spread across thousands of islands stretching approximately 5k kilometers from west to east), is also among the most decentralized countries in the world. This creates immense variation in both governance and developmental outcomes across the districts.1 There are slightly over 500 districts in one of two types: kota (city) are more urbanized, while
kabupaten (regency) are typically more rural. Provinces, of which there are just under 40, form the meso (middle) tier of government. When you open the dataset, you’ll immediately see that the variables have long and clunky names. You’ll definitely want to do some renaming to make the relevant variables easier to work with (you don’t need to rename all 150+ variables, of course!). You will also need to do some data management in the form of creating new variables. We ask that you upload both a writeup of your analysis (one question at a time) and the .do file you used to create it. We want to understand variation in health levels. The variable we will use to capture that is the Morbidity rate (in %). This refers to the rate at which serious disease occurs in a population. It is often used as a measure of the health of a population, as well as to ascertain a population’s healthcare needs. 1. Let’s start by getting a clearer picture of variation in morbidity rates across the districts. Specifically, please answer the question: what does district level variation in morbidity rates look like? In 4-5 sentences (written for an informed and well-trained policy maker), please give a sense of district level variation in morbidity rates using descriptive statistics. In addition, please create a clean figure that visualizes the distribution of morbidity rates and refer to it in your explanation. 2. Now we want to begin to understand what may be driving this variation. You will create a series of models, each a refinement of the previous. Please create a regression table in which each estimation is a new model (refer back to the slides to see what this should look like!). Let’s start with demographic attributes, which will be model 1. Please estimate a multiple regression with the following IVs (and remember that you need to create some of the variables). After you estimate the regression, please interpret the results in detail, as we have in class.
1 If you are really intrigued by the case and interested in further reading, you could give this a try: “Indonesia’s Decentralization Experiment: Motivations, Successes, and Unintended Consequences” by Ostwald, Tajima, and Samphantharak (2016).

• proportion of the population over 65
• poverty rate 3. We’ll now add in some heath infrastructure data. Please include the following two variables in addition to the earlier two. This is your model 2. As above, please interpret the results in detail, as we have in class. Please look at multiple indicators, thinking about whether the additional variables are generally useful.
• physician density (doctors per 1,000 population)
• puskesmas density (community health centre per 1,000 population) 4. Now we’ll add some additional infrastructure data to capture both level of urbanization and transportation infrastructure. Please keep the earlier variables in your model as well. This is model 3. As before, please interpret the results in detail, looking at multiple indicators and thinking about whether the additional variables are generally useful.
• population density
• road density (total length of roads divided by area) 5. Indonesia is comprised of several major and many smaller islands. For historical reasons, there are some key differences across these islands. We want to make sure that we control for these. To do that, we need dummy variables for the following islands: Java, Sumatra, Bali, Riau, and Kalimantan. You’ll also want a variable for “other islands”. This will take several steps, but fear not, there’s a reasonably easy way of doing it. Do you see the province variable? Well… Java has only 6 provinces (four plus the de facto provinces of Jakarta and Yogyakarta). Bali is a single province, as is Riau. In short, you may not need more than around 25 lines of code in total. Once you’ve done that, please add controls for districts within Java, Sumatra, Bali, Riau, and Kalimantan to the previous model. The reference category will be “other islands”. This will be model 4. As above, please interpret the findings carefully. Did it help to add these controls?
Island Group Provinces
Java Jakarta, Yogyakarta, Central Java, East Java, West Java, Banten
Sumatra Aceh, North Sumatra, West Sumatra, Jambi, Bengkulu, South Sumatra, Lampung
Bali Bali
Riau Riau
Kalimantan Central Kalimantan, North Kalimantan, South Kalimantan, East Kalimantan, West Kalimantan Good luck and have fun !

留学生辅导